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INTRODUCTION 


This manual provides a listing of error messages that 
are returned to the console by the SPERRY 5000 Series 
Operating Systems. Along with each message, there 


-is a description of the conditions leading to the 


message, what corrective action (if any) you can take, 
and references to system documentation providing 
more information on the command or action in progress 
when the message was generated. 


Organization 


The book separates the messages according to system. 
Within each system section, messages are 
alphabetized. There is supplemental information at 
the end of each of these sections. 


Problem Reporting 


Some of the conditions indicated by error messages 
can be corrected or removed so that you can continue 
using your system. Many of these conditions, 
however, will require that you report the problem to 
the Sperry Support Center. The Center’s hours are 
from 8:00 a.m. Eastern time to 5:00 p.m. Pacific time. 
The toll free number is (800) 328-0440 
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When you dial this number, you will be asked to enter 
one of the three digit codes below to identify the type 
of system you have. This will route your call to the 
appropriate group oc specialists. 


System Code 
5000/20 057 
5000/40 057 
5000/50 054 
5000/60 058 
5000/80 058 
5000/90 048 


Due to the number of systems we support, it is 
important that you follow this procedure. It is the 
most efficient and least time consuming way to direct 
attention to your problem. | 


| Unique versus Recurring Problems 


Conditions that generate messages can be divided into 
two broad categories: those that occur once and tnose 
that occur repeatedly. 


Unique conditions may be troublesome, but often are 
not worth the time and effort it takes to report them. 
Jf your system locks up or crashes and returns a 
message to the console, you will of course want to save 
as much information on the problem as possible. But if 
the state does not recur, it is reasonable to assume 
that something odd or unique precipitated the 
condition. There is no way to predict, and few ways 
to prevent, something like a brownout in your 
neighborhood. 


Recurring errors, however, are a different story. 
When a condition repeatedly ties up or brings down 
your system, then you must remove that condition so 
you can continue work. For a recurrent problem, you 
will need this manual to gather information on it, to 
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try solving it, and finally to report it. Before you call 
the 800 Number above, you will want to try rebooting 
(if called for, and most conditions can be removed by 
rebooting) or solving the problem by other means. If, 
for instance, a message says that there is not enough 
Swap space in the area to which you directed a swap, 
then you can try increasing the swap space available 
on your system. 


Memory Dump and Analysis 


For problems you cannot correct by these means and 
have to report to the Support Center, make notes on 
the conditions leading to the problem, and the 
message(s) returned when the system snagged. 


If you are using a 5000/20, 5000/40 or 5000/50 
Operating System, this book will sometimes instruct 
you to dump the system to disk. The contents of a 
system dump can often be analyzed, making it possible 
to recreate the conditions leading to the crash and 
determine their causes. 


The rule of thumb is: for any problem you cannot fix 
and which is troubling your system, collect as much 
information as you can, then call the Support Center. 
Some conditions will already have been reported and a 
fix will exist on them. New problems will need to be 
reported, solved and documented. 


Updating 


Because 5000 Series systems are constantly being 
enhanced and developed, this manual will be updated 
often so it can follow these changes and remain 
complete and accurate. To find out if you have the 
most current revision of this book, contact the 
Support Center. If you encounter messages or 
conditions not found in the most current revision, 
please submit a description to your representative so 
that we can add the information to future revisions. 


® 
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MESSAGES FOR THE 5000/20 5000/40 AND 5000/50 


Introduction 


The operating system reports errors to you in two 
ways: 


¢ By displaying the error: messages described in 
this manual on the system console. 


¢ By recording detailed error codes in the system 
error log. Using the error report utility 
errpt(1M) you can extract and expand the codes 
and display the appropriate error message. 


The operating system uses only one mechanism to 
report any errors it detects in any of these three 
subsystems: 


e operating system kernel software--errors detected 
in the kernel are reported to the console and 
recorded in the error log. 


* communication device software (asynchronous and 
synchronous drivers--messages for asynchronous 
ports a and b are not written to the console) 


* mass storage device (disk, flexible disk, and tape) 
driver software 


As soon as mass storage devices detect read or write 
errors, they report them to the console and record 
them in the log. The console receives a brief message 
which shows only the Major Number of the "Character 
Special Device" node for the device producing the 
error, its minor number, and the attempted operation 
retry count. 
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Console error messages for mass storage devices are 
in this standard format: 


Block device error, device = <major number>, <minor number> 
Recovered/Unrecovered READ/WRITE error, Retry count = 4d 


You can get more information on a block device error 
by using the major and minor numbers and the 
following chart to determine which device issued the 
error. The Device Name column shows what value 
you must supply to the error report utility errpt(1M)- 
in order to display more detailed information. 


Major Device Device 

Number Type Name 

7 st506 5.25" disk driver h501, hs02 
and flex driver £501, £502 

8 5.25" streaming tape rtp, rtpl 
driver 

22 scsi disk subsystem sd01...sd34 

23 scsi tape subsystem ss61...ss41 


To use errpt(1M), enter: 
errpt -d <device name> 


using <device name> from the table above. 


Use the table of contents at the beginning of this 
section to find the page number of a message you want 
to research. 


Some panics may originate from various device 


drivers, providing an indication of the offending 
driver. These messages: normally are descriptive 
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enough that no further explanation is necessary. 
Because these messages are self explanatory and often 
are added to or otherwise changed, they are not 
contained in this book. 


At the end of the Error Messages section, there is a 
complete list of 5.25-Inch Tape Driver’ Error 
Messages. Other entries in the text deal with tape 
problems, but this list contains all the codes you might 
see. 


Following this is an an example of the TTY System 
Error Log and explanations of its entries. 
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at mfpr regno = %x 


MEANING 


The kernel’s integrity has been violated. 


PROBABLE CAUSE 


None: this should never happen. 


SUGGESTED ACTION 
Generate a system dump and reboot the system as 


described in crash(8) in the Administrator Reference 
Manual. 
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a 


at mtpr regno = %x 


MEANING 


The kernel’s integrity has been violated. 


PROBABLE CAUSE 


None: this should never happen. 


SUGGESTED ACTION 
Generate a system dump and reboot the system as 


described in crash(8) in the Administrator Reference 
Manual. 
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at ymapit with problem 


MEANING 


The kernel’s integrity has been violated. 


PROBABLE CAUSE 


None: this should never happen. 


SUGGESTED ACTION 
Generate a system dump and reboot the system as 


described in crash(8) in the Administrator Reference 
Manual. 
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attempt to unmap kernel 


MEANING 


The kernel’s integrity has been violated. A panic or 
degraded situation will occur. This message is in a 
routine that sets up dma buffering. This routine does 
not return a status so the caller procedes as if the 
buffer mapping was successful. 


PROBABLE CAUSE 


None: this should never happen. 


SUGGESTED ACTION 


Generate a system dump and reboot the system as 
described in crash(8) in the Administrator Reference 
Manual. The dump should be performed as soon as 
possible in order that subsequent system activity does 
not erase memory associated with this error. 
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attempt to unmap kernel at dumapit 


MEANING 


The kernel’s integrity has been violated. 


PROBABLE CAUSE 


None: this should never happen. 


SUGGESTED ACTION 
Generate a system dump and reboot the system as 


described in crash(8) in the Administrator Reference 
Manual. 
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Block device error, device = %d, %d, 


Unrecovered READ/WRITE error, Retry count = %d 


MEANING 


This is one of the most common messages. It is 
displayed during error logging of a block device error 
to provide immediate notification of a media problem. 
The device major/minor numbers and attempted retry 
count provide supplemental information. 


PROBABLE CAUSE 


If the major and minor numbers indicate a disk or 
diskette drive, then a sector may be inaccessible due 
to disk aging or a power failure. This message also 
may indicate that the diskette is damaged, incorrectly 
mounted, or not yet formatted. 


SUGGESTED ACTION 
If a disk or diskette sector has become inaccessible, 


save all the files on the disk or diskette, reformat it 
and reinstall the saved files. 
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cache parity interrupt- trap 244, msp = %x 


MEANING 


A cache memory parity error has been detected and 
corrected. 


PROBABLE CAUSE 


Hardware failure. 
SUGGESTED ACTION 


If this condition occurs frequently, your system 
performance may suffer. Have the cache replaced. 
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CANNOT FLUSH IN-CORE LOG RECORDS 


MEANING 
In-core log records. created during initialization 


device testing cannot be flushed to the start-up 
subsystem (SUS) log. 


PROBABLE CAUSE 


Memory boards incorrectly seated. 


SUGGESTED ACTION 
Bring the system down. Turn off power to the 


system. Reseat the memory boards, restore power 
and reboot the system. 
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Can’t allocate message buffer 


MEANING 


At system initialization time, the kernel found that too 
much memory was being allocated for messages, 
rendering the message facility unusable. 


PROBABLE CAUSE 


The amount of memory configured for messages 
exceeds the amount of memory currently available in 
the machine. 


SUGGESTED ACTION 


Check the msgseg and msgssz entries in the system 
description file. msgseg is the number of segments to 
allocate and msgssz is the size each segment should 
be. The product of these numbers (msgseg and 
msgssz) is the amount of memory to allocate for 
messages. If this amount exceeds the amount of 
memory currently available in the machine, the above 
message appears when you boot the system. 


Rebuild the kernel specifying fewer and/or smaller 
message segments. See config(1M) in the 
Administrator Reference Manual for a discussion of 
the msgseg and msgssz parameters, and for a 
description of how to build and boot a kernel. 


If none of these attempts succeed, more memory must 
be added to the machine. 
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Can’t fork loader for /dev/hp%x 


MEANING 


This message is sent by /etc/Idhpsio. The process 
executing /etc/Idhpsio cannot fork a copy of itself to 
download the board specified. This message is 
followed by a message from perror that interprets the 
errno status. 


PROBABLE CAUSE 


It is possible that the process table is full. However, 
this process usually runs immediately when going from 
single-user to multi-user mode. The number of active 
processes would be very low at this time. Fork may 
also fail if the system has insufficient memory. This 
message may indicate a memory error. 


SUGGESTED ACTION 


Invoke the sar(1) command with the -v option to 
determine if the process table is full. If the table is 
full, reduce the workload or rebuild the kernel 
specifying a larger process table. See config(1M) in 
the Administrator Reference Manual for a discussion 
of the parameter procs, and for a description of how to 
build and boot a kernel. 
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Can’t open /etc/hpsio. out 


MEANING 


This message is sent by /etc/Idhpsio. The download 
object file (usually /etc/hpsio.out) cannot be opened. 
This message is followed by a message from perror 
that interprets the errno status. 


PROBABLE CAUSE 


Since the file /etc/hpsio.out exists on a newly 
installed system, it may have been removed or had its 
permissions changed. Otherwise, an entry in the file 
/etc/inittab invoking /etc/ldhpsio may have been 
corrupted or inadvertently changed. 


SUGGESTED ACTION 
Check to make sure the file was specified correctly. 
Check the file to make sure its access permissions are 


correct. Check /etc/inittab for a correct invocation of 
/etc/Idhpsio. 
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clock race interrupt- trap 224 


MEANING 


An on-board (PMC) interrupt disappeared while being 
serviced. 

PROBABLE CAUSE 

This can be caused by a free running clock not being 


serviced properly, but the condition is so rare it 
should never occur. 


SUGGESTED ACTION 


Generate a system dump and reboot the system as 
described in crash(8) in the Administrator Reference 
Manual. 
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cpinit: gate didn’t open 


MEANING 
The tape driver timed out while waiting for the no- 


operation (NOP) command used to clear interrupts to 
complete. 


PROBABLE CAUSE 

The control cable to the drive may not be connected, 
or the drive itself may be faulty. 

SUGGESTED ACTION 

Check the drive’s control cable. If the cable is 
connected properly, suspect the drive itself of being 


faulty. Check the hardware, or contact the Support 
Center and arrange to have it checked. 
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DANGER: mfree map overflow %x lost Sy items at $z 


MEANING 


The kernel has detected that the specified map array 
is full and that an unsuccessful attempt was made to 
free a resource. 


PROBABLE CAUSE 


This typically results from fragmentation of the 
resource managed with the map array. 


SUGGESTED ACTION 


Rebuild the kernel specifying a larger map size for the 
appropriate map (Swapmap, msgmap, and semmap). 
The map overflow value (%x) can be used to identify 
the particular kernel map that overflowed. It’s 
possible, however, that the map might not be one of 
the 3 listed above. See config(1M) in the 
Administrator Reference Manual for a discussion of 
the various kernel map parameters and for a 
description of how to build and boot a kernel. 
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DANGER: out of swap space needed %d blocks 


MEANING 


The kernel found insufficient space on the swap 
device when attempting to swap out a process or a 
copy of a pure text image. The kernel was unable to 
reclaim any swap space. There is a real danger of 
system deadlock when you see this message. 


PROBABLE CAUSE 


The system has not been tuned to match its workload, 
so the workload is taxing the system’s swap space 
capacity. 


SUGGESTED ACTION 


Either reduce the workload or increase the swap space 
to accommodate the workload. Before increasing the 
Swap space, backup all files that have been created or 
changed since the system was last installed. To 
increase the swap space, follow the procedure 
described in the /nstallation and Verification Guide. 
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/dev/hp%x ioctl TCGETA error 


MEANING 

This message is sent by /etc/ldhpsto, and means that 
one of the actual download commands to the HPSIO 
failed. A message from perror follows and explains 
the errno status. 


PROBABLE CAUSE 


Unknown. 


SUGGESTED ACTION 


None. If the HPSIO appears to be hung, reboot the 
system. | 
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/dev/hp%x ioctl TCSETA error 


MEANING 

This message is sent by /etc/Idhpsio. The actual 
command to begin execution on the HPSIO board 
failed. A message from perror follows and explains 
the errno status. 


PROBABLE CAUSE 


Unknown. 


SUGGESTED ACTION 


None. If the HPSIO appears to be hung, reboot the 
system. 
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double bit interrupt-- trap 249 


MEANING 


The kernel has detected a memory failure, either an 
SMFAIL (local system memory/timeout) or an MPERR 
(multi-bit memory error). 


PROBABLE CAUSE 


Bad or marginally bad memory. 


SUGGESTED ACTION 


Have the hardware card with the failing memory 
replaced. 
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Double panic: %s 


MEANING 

This message is displayed if, while the system is 
processing a panic, a second panic occurs. 
PROBABLE CAUSE 

Catastrophic system failure that prevents normal 
panic processing, possibly including operations such 
as error logging or terminal I/O. 

SUGGESTED ACTION 

Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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Dump beginning -- 


MEANING 


The system is beginning to dump its memory. This 
prompt is for information only. 


PROBABLE CAUSE 
A system dump you requested (following the 


instructions in crash(8) in the Administrator 
Reference Manual) is underway. 


SUGGESTED ACTION 


Continue to follow the procedure in crash(8) to 
complete the system dump. 
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Dump complete 


MEANING 


The system has finished dumping its memory to tape. 
This is for information only. 


PROBABLE CAUSE 
A system dump you _ requested (following the 


instructions in crash(8) in the Administrator 
Reference Manual has completed. | 


SUGGESTED ACTION 


Continue following the procedure in crash(8) to’ 
capture the namelist file (for example, /unix). 
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Dumpdev is invalid. Will use first tape drive 


MEANING 


The kernel could not select the tape drive. 


PROBABLE CAUSE 


The tape drive may _ be _ inaccessible. The 
configuration information in the .cf file (described in 
config(1M) in the Administrator Reference Manual) 
may be incorrect. 


SUGGESTED ACTION 


If possible, continue following the procedure in 
crash(8) to complete the dump. If it is not possible, 
reboot the system. Once the system is back up, check 
the .cf file and make sure that the dump entry 
specifies the tape drive (tp) as the dump device. For 
an example of a dump entry, see config(1M) in the 
Administrator Reference Manual. Check the 
accessibility of the tape drive by using cpio(1), dd(1) 
or tartl).. 


Use dd(1) to copy the dump from tape to disk for 
analysis. You may analyze the dump by invoking 
crash(1M). The namelist file (for example, /unix) 
provides a symbol table with the locations of the 
kernel data structures and procedures. 
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error on tape: %x $x $x $x %x %x 


MEANING 


An I/O error occurred while the system was writing 
the dump contents to a tape. The first hex field 
contains the tape status and the others contain control 
information. 


PROBABLE CAUSE 


The tape may be write-protected. If the first hex 
field is Oxff, the tape controller and/or drive is bad. 


SUGGESTED ACTION 


If you have another write-enabled tape available, and 
it is a good idea to keep one handy for emergencies 
such as this, use it to finish dumping the system. 
When the dump is complete and your system is 
operational again, check the tape that failed. Discard 
it if it is damaged. 
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/etc/hpsio. out file header read 

/etc/hpsio.out system header read 

/etc/hpsio. out section header read, section = x 
/etc/hpsio.out data read 

MEANING 

These messages are sent by /etc/idhpsio. All of them 
indicate that /etc/ldhpsio had problems reading the 
download object file. Each message is followed by a 


message from perror explaining the errno status. 


Instead of the errno status, the section read message 
may have this format: 


bytes requested = x bytes read = y 
This means the section read command failed because 
less data was returned than was requested. 
PROBABLE CAUSE 
The most likely cause of these messages is a corrupted 
download object file. 
SUGGESTED ACTION 


Restore /etc/hpsio.out from the latest backup tape-- 
usually the full backup. 
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Four k Pagemap Enabled 


MEANING 


At system initialization the status of the memory 
management unit (MMU) is displayed. 


PROBABLE CAUSE 


System initialization. 


SUGGESTED ACTION 


None. This is for information only. 
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hpminit: not enough page blocks, %d short 


MEANING 


The asynchronous/synchronous HPSIO driver needs 
%d more direct memory access (DMA) page blocks than 
are available in its current allocation. 


PROBABLE CAUSE 


A new or updated DMA driver has been added to the 
system. 


SUGGESTED ACTION 


Rebuild the kernel specifying more DMA page blocks. 
See config(1M) in the Administrator Reference Manual 
for a discussion of the parameter dmanpb, ana for a 
description of how to build and boot a kernel. The 
SPERRY 5000 Series Device Driver Guide also 
discusses DMA page blocks in Chapter 10, "A Virtual 
Disk Driver: Memory Management." 
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HPSIO /dev/hp0%x controller at $y failed level 0 
diagnostics Firmware error code - $z 


MEANING 


The asynchronous/synchronous HPSIO driver reports 
that module %x (where %x is 0 through 7) at board 
address %y failed its on-board level 0 diagnostics. 


The Firmware error code message always accompanies 
the first message and prints the level 0 diagnostics 
Summary report in hexadecimal. This is a 16 bit value 
providing this information: 


¢ Bits 0 through 7 
Channel status bits (0 = noerror, 1 = error ) 


¢ Bits 8 through 11 


If bit 12 is set at 1, these bits indicate which level 0 
diagnostic is currently being run. Since this 
message appears only when the level O tests have 
failed, all of these bits should be set at 0. | 


e Bit 12 


Level 0 completion status. (0 = complete, 1 = 
running. ) 
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e Bits 13 through 15 
Critical board error code: 
0 Noerror (should not be the case here) 
1 ROM checksum error 
2 RAM stuck bit/address decode error 


3 Mailbox register stuck bit error 
4 All channels degraded error 


If any of these critical errors are reported, the board 
indicated is not functional and should be replaced. 


PROBABLE CAUSE 


Hardware failure on the board. 


SUGGESTED ACTION 


If only the ROMS are indicated, replace them. 
Otherwise, replace the entire board. 
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HPSIO /dev/hp0%x controller at %y failed loading 
software REASON %z: 


MEANING 


The asynchronous/synchronous HPSIO driver reports 
that controller /dev/hp0%x (where %x is 0 through 7) 
at board %y failed to download. 


The REASON always follows the first part of the 
message. It gives the firmware error code (%Z) and 


displays an English language interpretation of it. 


Possible reasons for failure include: 


Code Interpretation — 

1 Checksum Error 
(probable software error) 

pe Bad Buffer Address 
(probable software error) 

3 Controller Malfunction 
(probable hardware error) 

‘4 HPSIO Processor Bus Error 
(hardware or software error) 

3 Bad Function Code 
(probable software error) 

6 Bad Buffer Length Count 
(probable software error) 

8 Parity Error 
(probable hardware error) 

a HPSIO Processor Unexpected Interrupt 


(probable hardware error) 
Unexpected DUART Interrupt 
(probable hardware error) 
HPSIO Processor Address Error 
(hardware or software error) 
Download Already in Progress 
(probable software error) 


iy Ww Pp 
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PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


If a hardware error is indicated, replace the board. If 
a software error is indicated and the system 
recovered, take NO action. If the board appears 
hung, reboot the system to reset the board. 


1-30 UP-12218 


Error Messages 
SNe SES TN IIR EPL STE AIR A IEE RE TS RE RT OR ETI 


HPSIO /dev/hp0%x controller at Sy not responding 
Firmware error code - %z 


MEANING 
The asynchronous/synchronous HPSIO driver reports 
that module %x (where x is 0 through 7) at board 
address %y did not give up the mailbox within the time 
interval allowed. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 
Have the board replaced. If the problem persists 


after replacement, then another software or system 
hardware problem is indicated. 
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HPSIO /dev/hp0%x controller at $y returned an 
invalid command at %z 


MEANING 
The asynchronous/synchronous HPSIO driver reports 
that module %x (where %x is 0 through 7) at board 


address ty returned an address %z that was not the 
location of a valid command block. 


PROBABLE CAUSE 


Currently under investigation 


SUGGESTED ACTION 
None. The board does not have to be replaced. 


However, if the board appears to be hung, reboot the 
system. 
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HPSIO loader already exists for /dev/hp%x 


MEANING 


This message is sent by /etc/ldhpsio. Two 
/etc/ldhpsio commands have been issued to download 
the same board. 


PROBABLE CAUSE 

An entry in the file /etc/inittab invoking the 
fetc/Idhpsio may have _ been corrupted or 
inadvertently changed or duplicated. 

SUGGESTED ACTION 


Check /etc/inittab for a correct invocation of 
/etc/ldhpsio. 
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HPSIO: address error pe = %x ad = %y 


MEANING , 
This message is sent by the code downloaded to the 
HPSIO. The on-board code suffered an address 
error. %X is the program counter at the time the error 
occurred, and %y is the address that caused the 
error. 


PROBABLE CAUSE 


A software failure. 


SUGGESTED ACTION 


None. Software reloads the board automatically and , 
continues. dd 
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HPSIO: buserror: pce=%x ad=%y 


MEANING 


This message is sent by the code downloaded to the 
HPSIO. The on-board code encountered a bus error. 
%x is the program counter at the time of the bus error, 
and %y is the address that caused the error. 


PROBABLE CAUSE 


A hardware or software failure. 


SUGGESTED ACTION 


None. The on-board code is reloaded automatically. 
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HPSIO: memory parity error: pe = %x 


MEANING 


This message is sent by the code downloaded to the 
HPSIO. A memory parity error was detected in the 
on-board RAM. 


PROBABLE CAUSE 


A hardware failure 


SUGGESTED ACTION 


None. The on-board RAM will be reloaded 
automatically. 
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HPSIO: power fail interrupt 


MEANING 

This message is sent by the code downloaded to the 
HPSIO. The board has issued the interrupt that 
indicates a power failure. This interrupt was 
provided in system software when it was thought the 
HPSIO would be battery backed. Since the board is 
not battery backed, the interrupt is meaningless. 


PROBABLE CAUSE 


A hardware failure. 


SUGGESTED ACTION 


None. 


UP-12218 | 1-37 


5000/20/40/50 


SRF Na EA RSE TE ENN PE a OT a 


HPSIO: Timeout table overflow 


MEANING 

This message is sent by the code downloaded to the 
HPSIO. The on board code has made more time out 
requests than can be handled by the table allocated 
for that purpose. 

PROBABLE CAUSE 

The software has not started the system clock or the 
clock is not working properly. 


SUGGESTED ACTION 


Contact the Support Center and arrange to have the 
HPSIO board checked. 
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jladdress > 2°24 


MEANING 


When updating a file’s i-node on the file system, the 
kernel found a block number in the i-node to be larger 
than is permissible. 


PROBABLE CAUSE 


Software and/or hardware problem--usually a 
corrupted file system. 


SUGGESTED ACTION 


To check the state of any file system you suspect is. 
corrupted, unmount the file system and check it with 
fsck(1M). If you suspect the root file system is 
corrupted, you will have to enter single user mode to 
check it. 


This error can also be generated by new or modified 
device drivers that have not been completely 
debugged. 


If no drivers have been added or modified, the error 
can also be attributed to a disk drive and/or 
controller, or a memory problem. Contact the Support 
Center. 


See fsck(1M) in the Administrator Reference Manual. 


UP-12218 1-39 


5000/20/40/50 


RRA BET SEI RR SE aE ERT DE PRG NT oR RST SL Oe OS SED SE OLD BEE SITE OS 


illegal SUS log function 


MEANING 


An illegal start-up subsystem (SUS) log function was 
called. 


PROBABLE CAUSE 


None. This should never happen. 


SUGGESTED ACTION 


Reboot the system. 
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inode table overflow 


MEANING 


The kernel has detected that a request was made to 
read an inode structure from disk, but no in-memory 


inode structures were available to store it. As a - 


result, the operation which initiated the request 
failed. 


PROBABLE CAUSE 


Ihe system has not been tuned to match the workload, 
so the. workload is taxing the system’s inode table 
capacity. 


SUGGESTED ACTION 


Reduce the workload or rebuild the kernel specifying 
a larger inode table. See config(1M) in the 
Administrator Reference Manual for a discussion of 
the parameter inodes and for a description of how to 
build and boot a kernel. 
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Kernel too large 


MEANING 


The system has determined that the current 
configuration is larger than can be supported with the 
amount of memory available to the machine. 


PROBABLE CAUSE 


Configuration error. 


SUGGESTED ACTION 


Reboot the system using /unix.old. Check the system 
description file (.cf--described in config(1M) in the 
Administrator Reference Manual) for configuration 
information that has changed due to tuning activities 
or the installation of new drivers (for example, 
communications drivers). Correct the .cf file and. 
rebuild the kernel by following the procedure 
described in config(1M). 
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LOG FLUSH FAILED %d 


MEANING 

In-core log records created during initialization 
device testing cannot be flushed to the start-up 
subsystem (SUS) log. 

PROBABLE CAUSE 


Memory boards incorrectly seated. 


SUGGESTED ACTION 
Bring the system down. Turn off power to the 


system. Reseat the memory boards, restore power 
and reboot the system. 
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mag tape %d needs write ring 


MEANING 


An attempt to write to a magnetic tape failed because 
the tape was not write-enabled. 


PROBABLE CAUSE 


The tape has no write-ring inserted. 


SUGGESTED ACTION 
Remove the tape from the drive, insert a write ring 


and reinsert the tape in the drive. Try the write 
command again. 
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mag tape %d not ready 


MEANING 


An attempt to access a magnetic tape failed because 
the drive was not ready. 


PROBABLE CAUSE 


The drive may not be powered on, or it may not be 
on-line. The control cable to the drive may not be 
connected properly, or the drive may be faulty. 


SUGGESTED ACTION 


Turn on the power to the drive, place it on-line and 
reissue the command. If this message is returned 
again, check the drive's control cable. If the cable is 
properly connected, suspect the drive itself of being 
faulty. Check the hardware, or contact the. Support 
Center and arrange to have it checked by a Customer 
Engineer. | 
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mag tape err: 


MEANING 


An attempt to access a magnetic tape failed. 


PROBABLE CAUSE 

Operator or media error. The tape may be write- 
protected. 

SUGGESTED ACTION 

Check the command that caused the error to be 
returned. If it seems okay, then suspect that the tape 


is write-protected or bad. Try reissuing the command 
with a write-enabled or new tape in the drive. 
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mag tape gate timeout 


MEANING 
The tape driver timed out while waiting for the no- 


operation (NOP) command used to clear interrupts to 
complete. The tape controller is not responding. 


PROBABLE CAUSE 


The control.cable to the drive may not be connected, 


or the drive itself may be faulty. 


SUGGESTED ACTION 


Check the drive’s control cable. If the cable is 
connected properly, suspect the drive of being 
faulty. Check the hardware, or contact the Support 
Center and arrange to have it checked by a Customer 
Engineer. 
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malloc on process table copy: Not enough space 


MEANING 


While reading the /unix file to determine where the 
process table was in memory, the ps utility 
encountered an inconsistency. 


PROBABLE CAUSE 


The kernel in memory does not match that in the 
namelist file /unix. The file may have been replaced 
during installation of a product or by the .system 
administrator. The system may have’ been loaded from 
something other than /unix--for instance /unix.old. 


SUGGESTED ACTION 
Ignore the message and reboot the system from /unix. 


The kernel in memory then will match the kernel in 
/unix. 
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max program size = %d 


MEANING 

At system initialization the maximum allowable size for 
programs (code, data, stack) is displayed. 
PROBABLE CAUSE 


System initialization. 


SUGGESTED ACTION 


None. This is for information only, but you should 
record this value in the system log: people developing 
software packages will want to know the maximum 
program size. | 
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maxmem = %d 


MEANING 


At system initialization the maximum available system 
memory is displayed. 


PROBABLE CAUSE | 
System initialization. ; | 


SUGGESTED ACTION 


None. This is for information only, but you should 
record this value in the system log. 
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memfree %x 


MEANING 


The kernel found a zero in the addresses of free 
memory pages. 


PROBABLE CAUSE 


A hardware or software failure. 


SUGGESTED ACTION 


Check any new device drivers that have not been 
debugged completely. Also check the system 
description file (.cf--described in config(1M) in the 
Administrator Reference Manual) to make sure the 
configuration information is correct. If none of these 
uncovers the source of the problem, suspect hardware 
problems and contact the Support Center. 
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multibus IO or dmaerror - trap 239 


MEANING 

A multibus device is unable to access local system 
memory. 

PROBABLE CAUSE 


A hardware failure or an improperly seated memory 
card. 


SUGGESTED ACTION 


Turn the machine power off, reseat all memory cards, 
reseat all controller cards, turn the power back on, 
and reboot. If reseating the cards does not alleviate 
this condition, replace the cards. 
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MEANING 


The kernel memory management software has detected 
an invalid state. 


PROBABLE CAUSE 


Malfunctioning kernel. This should never happen. 


SUGGESTED ACTION 


Generate a system dump and reboot the system as 
described in crash(8) in the Administrator Reference 
Manual. Once this system message is displayed, 
system scheduling and _ processing’ continue. 
Therefore, the user should perform the system dump 
quickly for information to be meaningful. A system 
dump would be performed by killing power in manual 
mode. 
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No HPSIO code file name supplied 


MEANING 

This message is sent by /etc/Idhpsio. A file (usually 
/etc/hpsio.out) containing the code executed on the 
board has not been specified. 

PROBABLE CAUSE 

An entry in the file /etc/inittab invoking /etc/Idhpsio 
to load the HPSIO may have been corrupted or 
inadvertently changed. 


SUGGESTED ACTION 


Check /etc/inittab for a correct invocation of 
/etc/ldhpsio. 
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No HPSIO device nodes supplied 


MEANING 

This message is sent by /etc/Idhpsio. No device 
nodes (such as /dev/hp00) have been supplied on the 
command line. 

PROBABLE CAUSE 

An entry in the file /etc/inittab invoking /etc/ldhpsio 
to load the HPSIO may have been corrupted or 
inadvertently changed. 


SUGGESTED ACTION 


Check /etc/inittab for a correct invocation of 
/etc/ldhpsio. | 
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No swap space for exec args 


MEANING 


During exec processing there is insufficient swap 
space to temporarily hold the passed arguments. A 
"no memory" error is returned to the caller. 


PROBABLE CAUSE 


The system has not been tuned to match its workload, 
so the workload is taxing the system's swap space 
capacity. 


SUGGESTED ACTION 


Either reduce the workload or increase the swap space 
to accommodate the workload. Before increasing the 
swap space, backup all files that have been created or 
changed since the system was last installed. To 
increase the swap space, follow the procedure 
described in the Jnstallation and Verification Guide. 
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no file 


MEANING 


The kernel has detected that the open file table is full 
and a new reference to a file has failed. 


PROBABLE CAUSE 


The system has not been tuned to match the workload, 
so the workload is taxing the system's open file table 
capacity. 


SUGGESTED ACTION 


Reduce the workload or rebuild the kernel specifying 
a larger open file table. See config(1M) in the 
Administrator Reference Manual for a discussion of 
the parameter files and a description how to build and 
boot a kernel. 
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out of text 


MEANING 


The kernel has detected that an attempt to allocate a 
shared text structure failed because there is a lack of 
avaiable structures. 


PROBABLE CAUSE 


The system has not been tuned to match the workload, 
so the workload is taxing the system’s text table 
capacity. 


SUGGESTED ACTION 


Reduce the workload or rebuild the kernel specifying 
a larger text table. See config(1M) in the 
Administrator Reference Manual for a discussion of 
the parameter texts, and for a description of how to 
build and boot a kernel. 
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panic: 


MEANING 


This is a general prefix attached to other messages. 
It indicates severity--specifically that the kernel has 
encountered a condition that it cannot deal with and 
your system probably is about to go down. You will 
not see panic: by itself--it is always returned with a 
more specific message attached. 


The next 29 messages deal with panic situations. 
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panic: bad mem free 


MEANING 

The kernel has attempted to free a memory page which 
is outside the range of physical memory determined to 
be valid during system initialization. 

This indicates that the kernel can no longer manage 


the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 


Generate a system dump and reboot the system by 
following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: bad mem free-list 


MEANING 

The kernel has detected that the count of available 
memory pages has gone negative meaning more memory 
has been allocated than should be available. 

This message indicates that the kernel can no longer 
manage the memory successfully, a condition which 
normally should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 
Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: bad swap address 


MEANING 


The concept of swapping is critical to both swapping 
and demand paging kernels. Exceptions (errors) 
detected during these procedures usually result in 
some type of system panic. 


This message informs you that the kernel has made an 
attempt to swap into or out of a disk block which is not 
in the partition designated as swap space. 

This indicates that the kernel can no longer manage 


the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 


Generate a system dump and reboot the system by 
following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: bflush: bad free list 


MEANING 


The linked list of free I/O buffers is corrupted. The 
processor has halted. 


PROBABLE CAUSE 


A hardware or software failure. 


SUGGESTED ACTION 


Generate a system dump, and reboot the system by 
following the procedure in crash (see crash(1M) in the 
Administrator Reference Manual) to gather 
information from the dump about the nature of the 
problem and the namelist file /unix. 


Check any new or modified device drivers that have 
not been debugged completely. Make sure you have 
correct configuration information in the system 
description file (the .cf file--described in config(1M) 
in the Administrator Reference Manual. 


If neither of these attempts succeeds in clearing the 


problem, suspect bad hardware and contact the 
Support Center. 
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panic: binit 


MEANING 

This occurs during system initialization when the 
kernel cannot find enough memory or memory 
management unit (MMU) resources to initialize the 
system I/O buffer pool. 

PROBABLE CAUSE 

The kernel has grown because: 


e new device drivers have been added, or 


e one or more of its parameters have been increased. 


SUGGESTED ACTION 


Reboot the system with the previous kernel (for 
example, /unix.old). Rebuild the kernel, specifying 
a smaller number of buffers. See config(1M) in the 
Administrator Reference Manual for a discussion of 
the parameter buffers and for a description of how to 
build and boot a kernel. 
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panic: cxofree: no pb freed 


MEANING 


The kernel has detected an attempt to free the memory 
management unit (MMU) resources of some process, an 
attempt which failed because no such resources could 
be located. This message indicates that no MMU 
resources are available for user processes. 


PROBABLE CAUSE 

None. This is one of several panics which should 
almost never occur and are trapped at miscellaneous 
locations within the kernel. 

SUGGESTED ACTION 

Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 


UP-12218 Update B 1-65 


9000/20/40/50 


panic: devtab 


MEANING 

The kernel has detected that a hash on a device 
number and block number failed to produce a buffer 
address. 

PROBABLE CAUSE 

None. This is one of several processes which should 


never occur and are trapped at miscellaneous locations 
within the kernel. 


SUGGESTED ACTION 


Generate a system dump and reboot the system by 
following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: dup alloc 


MEANING 

The kernel has detected that the memory allocation 
algorithm, which hashes to a free memory page, has 
yielded an already allocated page. 

This indicates that the kernel can no longer manage 
the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 
Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: dup free 


MEANING 


An attempt has been made to free an already freed 
memory page. 


This indicates that the kernel can no longer manage 
the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 
Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: dup unfree 


MEANING 

In a demand paging system, the kernel has made an 
attempt to reclaim a memory page which is marked as 
being already allocated. 

This indicates that the kernel can no longer manage 


the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 


Generate a system dump and reboot the system by 
following the procedure described in crash(8) in the 
Administrator Reference Manual 
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panic: iinit 


MEANING 


During system initialization, the kernel cannot read 
the super-block for the root file system. 


PROBABLE CAUSE 


A media hardware error (for example, a bad block on a 
disk) or the kernel is configured for a root device 
which does not exist. 


SUGGESTED ACTION 


If you have just rebuilt the kernel by following the 
procedure in config(1M)in the Administrator 
Reference Manual, then you can reboot the system in 
manual mode by using /unix.old as the kernel. This 
procedure will not work, however, if the boot block is 
bad and if /unix and /unix.old are the same device. 
When the system is operational, check the system 
description file (.cf--described in config(1M)) for 
errors. The root device specified in the cf file might 
be incorrect. Also, try rewriting the boot block by 
executing 


format -draw device -maint. 
If you have not rebuilt the kernel, contact the 
Support Center for assistance in determining the 


status of your disk. Your disk may have to be 
reformatted. 
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panic: IO err in swap 


MEANING 


The concept of swapping is critical to both swapping 
and demand paging kernels. Exceptions (errors) 
detected during these procedures usually result in 
some type of system panic. 


This message informs you that the swap device driver 
has detected an I/O error during an attempt to read or 
write swap space. 

It indicates that the kernel can no longer manage the 
memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


Media hardware error, such as a bad block on a disk. 


SUGGESTED ACTION 
Contact the Support Center for assistance in 


determining the status of your disk. The disk may 
have to be reformatted. 
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panic: lost mem 


MEANING 

The kernel has detected that the count of available 
memory pages indicates some memory is free but none 
can be found in the free list. 

This indicates that the kernel can no longer manage 
the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 
Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: lost text 


MEANING 


Management of shared text reduces the demand for 
physical memory but is dependent on a rigid 
interaction between various kernel structures. This 
message means that the kernel has detected the loss of 
some shared text. 


Two conditions may be detected: 

e during swapping or shared text processing a text 
structure is found to have a positive count of loaded 
references but there are no linked processes 

e during demand paging a pagefault occurs in a 


shared text area but no associated text structure 
can be located. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 
Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: lost text proc 


MEANING 


Management of shared text reduces the demand for 
physical memory but is dependent on a rigid 
interaction between various kernel structures. 


This panic occurs when, during shared text 
processing, the kernel detects a text structure witha 
positive count of attached processes, but no process 
structure can be located with a back pointer to the 
text structure. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 
Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: lost u on swapin 


MEANING 


The concept of swapping is critical to both swapping 
and demand paging kernels. Exceptions (errors) 
detected during these procedures usually result in 
some type of system panic. 


The kernel, using some rudimentary validation 
procedures, has determined that an improper user 
structure has been swapped in. This indicates that 
the kernel can no longer swap processes. 


Along with this message, the kernel returns the 
values: 


p0Obr %x spt %x 


where pObr is the memory management register, and — 
spt is the location of the process's page table. 


PROBABLE CAUSE 


The kernel stack grew big enough to overwrite the 
swap extent array which maps user address space to 
swap space. This can result in an incorrect block 
number being used to swap in the structure. 


SUGGESTED ACTION 
Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: no fs 


MEANING 

The kernel, during a search of the mount table, has 
detected that the root file system is not mounted. 
PROBABLE CAUSE 

None. This is one of several processes which should 
never occur and are trapped at miscellaneous locations 
within the kernel. 

SUGGESTED ACTION 

Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: no imt 


MEANING 

The kernel, during a search of the mount table, could 
not locate a file system for which an inode is marked as 
mounted. 

PROBABLE CAUSE 

None. This is one of several processes which should 
never occur and are trapped at miscellaneous locations 
within the kernel. 

SUGGESTED ACTION 

Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: no procs 


MEANING 


This tells you that the kernel has detected a missing 
process structure. During forking of a new process, 
the initial code determined that a process structure is 
available for the new process, but, during actual 
creation, no such structure could be located. 


PROBABLE CAUSE 

None. This is one of several processes which. should 
nevez occur and are trapped at misce:laneous locations 
_within the kernel. 

SUGGESTED ACTION 

Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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PANIC: Not enough dma page blocks 


MEANING 


The kernel has detected an insufficient number of 
direct memory access (DMA) page blocks for the 
current number of DMA drivers. 


PROBABLE CAUSE 


A new or updated DMA driver has been added to the 
system. 


SUGGESTED ACTION 


Rebuild the kernel, specifying more DMA page blocks. 
See config(1M) in the Administrator Reference Manual 
for a discussion of the parameter dmanpb and for a 
description of how to rebuild and boot a kernel. The 
SPERRY 5000 Series Device Driver Guide also 
discusses DMA page blocks in Chapter 10, "A Virtual 
Disk Driver: Memory Management." 
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panic: sd: unable to initialize root and/or swap device 


MEANING 

The system root device in this case a small computer 
system interface (SCSI) could not be initialized. 
PROBABLE CAUSE. 

Either hardware or software malfunction. If this panic 
occurs while you are loading a new kernel, suspect 
that the kernel was built incorrectly. 

SUGGESTED ACTION 

If a new kernel is failing, reboot the system in manual 


mode using /unix.old as the kernel. Otherwise, 
report the problem to the Support Center. 


1-80 UP-12218 


Error Messages 


panic: sd: unable to open root device 


MEANING 


The system root device, in this case a small computer 
system interface (SCSI), could not be opened for 


access. 
PROBABLE CAUSE 

Either hardware or software malfunction. If this panic 
occurs while you are loading a new kernel, suspect 
that the kernel was built incorrectly. 

SUGGESTED ACTION 

If a new kernel is failing, reboot the system in manual 


mode using /unix.old as the kernel. Otherwise, 
report the problem to the Support Center. 
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panic: sd: unable to open swap device 


MEANING 

The system swap device, in this case a small computer 
system interface (SCSI), could not be opened for 
access. 

PROBABLE CAUSE 

Either hardware or software malfunction. If this panic 
occurs while you are loading a new kernel, suspect 
that the kernel was built incorrectly. | 
SUGGESTED ACTION 

If a new kernel is failing, reboot the system in manual 


mode using /unix.old as the kernel. Otherwise, 
report the problem to the Support Center. 
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panic: Timeout table overflow 


MEANING 


The kernel has attempted to schedule a timeout for a 
process but the callout table is full. 


PROBABLE CAUSE 


The kernel was configured with an _ insufficient 
number of callout entries for the current processing 
load. 


SUGGESTED ACTION 


Lighten the current processing load by rescheduling 
large processes to run at different times. 


Rebuild the kernel, specifying a larger number of 
callout entries. See config(1M) in the Administrator 
Reference Manual, for a discussion of the parameter 
calls, and for a description of how to rebuild and boot 
a kernel. 
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panic: trap 


user = aps = %bpe = %ctraptype %d 


MEANING 

This is the most common panic message. It occurs if 
the kernel encounters an exception condition (usually 
external) of which it has no knowledge or from which 
it cannot recover. The processor prints information 
about the panic and enters an infinite loop or halts. 


The strings in the second line describe: 


e user = $a 


%a gives the page address of the current user 
structure. 


eps = tb > 


%b is the status register at the time of the 
exception. 


e pe = %e 


%c is the program counter at the time of the 
exception. 


e trap type %d 
%$d is the MC68xxx processor exception vector 


number. For fault types 2 and 3 the faulted 
address (eaddr) is displayed on the console. 
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The trap types (defined in the file trap.h) are: 


Number Type 


QW bo 


reserved addressing fault 
(kernel mode bus error) 
compatibility mode fault 
division fault 

chk fault 

trapv fault 

privileged instruction fault 
trace trap 

xfc instruction fault 
reserved operand fault 


mi CO TIM OL 


© 


PROBABLE CAUSE 
Since the interrupt is unexpected, it is difficult to 
assign a probable cause to the event. There are, 


however, some possibilities: 


e New or modified device drivers that have not been 
debugged completely 


e Depletion of system resources 


e Incorrect information in the .cf file described in 
config(1M) in the Administrator Reference Manual. 


e Hardware problems 


SUGGESTED ACTION 


Generate a system dump, and reboot the system by 
following the procedure in crash (see crash(1M) in the 
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segmentation fault (kernel mode bus error) 


5000/20/40/50 
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Administrator Reference Manual) to identify the cause 
of the trap. If you cannot correct the problem, contact 
the Support Center. 
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panic: WD driver requires 2 DMA page blocks 


MEANING 


A disk driver cannot get the direct memory access 
(DMA) page blocks it requires. 


PROBABLE CAUSE 


A new or updated DMA driver has been added to the 
system. | 


SUGGESTED ACTION 


Rebuild the kernel specifying more DMA page blocks. 
See config(1M) in the Administrator Reference Manual 
for a discussion of the parameter dmanpb, and for a 
description of how to build and boot a kernel. The 
SPERRY 5000 Series Device Driver Guide also 
discusses DMA page blocks in Chapter 10, "A Virtual 
Disk Driver: Memory Management." Ane 
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panic: WD out of DMA page blocks 


MEANING 


A disk driver cannot get the direct memory access 
(DMA) page blocks it requires. 


PROBABLE CAUSE | 


A new or updated DMA driver has been added to the 
system. 


SUGGESTED ACTION © 


Rebuild the kernel specifying more DMA page blocks. 
See config(1M) in the Administrator Reference Manual 
for a discussion of the parameter dmanpb, and for a 
description of how to build and boot a kernel. The 
SPERRY 5000 Series Device Driver Guide also 
discusses DMA page blocks in Chapter 10, "A Virtual 
Disk Driver: Memory Management. " 
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panic: xque 


MEANING 


On a 5000/20 or 5000/40 system, this message appears 
only in the swapping kernel, and means that the 
kernel could not allocate enough memory to swap in a 
shared text segment. 


On a 5000/30 or 5000/50 system, this message means 
that the kernel has run out of page table resources to 
address virtual space. 


PROBABLE CAUSE 

The system does not have memory to match its 
workload. 

SUGGESTED ACTION 

For a 5000/20 or 5000/40 system, reduce the workload 
or switch to the demand paging kernel supplied with 
the system. For a 5000/30 or 5000/50 system, reduce 
the workload or reconfigure the kernel. 

Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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panic: xSswapseg bad swx 


MEANING 


The concept of swapping is critical to both swapping 
and demand paging kernels. Exceptions (errors) 
detected during these procedures usually result in 
some type of system panic. 


The kernel has detected a reference to a disk block 
which is not in the partition designated as swap space. 
This indicates that the kernel can no longer swap 
processes. 


PROBABLE CAUSE 

The kernel stack grew big enough to overwrite the 
swap extent array which maps user address space to 
Swap space. This can result in an invalid block 
number being selected for a swap in. 

SUGGESTED ACTION 

Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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Parallel Printer is not on-line 


MEANING 


This message is sent by the code downloaded to the 
HPSIO. The parallel printer appears to be offline. 


PROBABLE CAUSE 


The printer may not have been placed on-line before 
data was sent to it. Also, the printer may be faulty, or 
may not be connected (for example, its cable may be 
loose). 


SUGGESTED ACTION 


Place the printer on-line and reissue the command 
sending data to it. If you get this message again, 
check the printer’s connections and make sure they 
are all right. If this does not clear the condition, you 
can assume the printer is faulty. To make sure, use 
the same command to try sending data to another 
connected parallel printer you’re sure is on line. If 
this printer accepts the data, the first printer is 
faulty. 
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Parallel Printer is out of paper 


MEANING 


This message is sent by the code downloaded to the 
HPSIO. The parallel printer appears to be out of 


paper. 


PROBABLE CAUSE 


Usually, this message indicates that the printer has 
run out of paper while it is receiving data. The 
message also will be returned if data has been sent toa 
printer that is not configured correctly. Finally, it 
could be the result of a faulty printer. 


SUGGESTED ACTION 


Check the printer’s paper supply. If there is enough 
paper and its feeding track is not jammed, check to 
make sure the printer is properly configured and that 
the command sending data to it addressed the printer 
correctly. If none of these is the case, assume the 
printer is faulty. Try sending the data to another 
configured parallel printer. If this printer accepts 
the data and does not return this message, the first 
printer is faulty. 


1-92 UP-12218 


Error Messages 


SE a ES ET BG EE RB I Te IT PSE STAT RSI I 


--Please insert a non write-protected, tape 


MEANING 


The kernel is directing you to insert a write-enabled 
tape. 


PROBABLE CAUSE 


A system dump you requested while following the 
instructions in crash(8) in the Administrator 
Reference Manual is underway, and a tape is needed 
to receive the contents of a system dump. 


SUGGESTED ACTION 


Put a tape in the drive and continue following the 
procedure in crash(8) of the Administrator Reference 


Manual. 


UP-12218 1-32 


5000/20/40/50 


ERR ES BREE BO TE EDIT ID BLISS SAE EN LEE EELS OD LEADED LET TEE A EO 


Power Recovery # %d in progress 


MEANING 


The system is recovering successfully from a loss of 
power during which the system memory was preserved 
by the battery backup. The system counts and 
displays the number of power recoveries that have 
occurred since system initialization. f 


PROBABLE CAUSE 
Power loss due to weather related surge or accidental 
disruption of power supply. Someone, for instance, 


could have jostled or temporarily disconnected the 
power cable from the box. 


SUGGESTED ACTION 


Note the incident in the system log. 
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proc on q 


MEANING 

When making a process ready to run after the 
occurrence of a wakeup event, the kernel found the 
process to be already on the system queue. 


PROBABLE CAUSE 


Indeterminate. This should never happen. 


SUGGESTED ACTION 


Generate a system dump and reboot the system by 
following the procedure described in crash(8) in the 
Administrator Reference Manual. If your system is 
unaltered from its delivery state, contact the Support 
Center. 
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Reloading /dev/hp%x due to Fatal error 


MEANING 


This message is sent by /etc/Idhpsio. The HPSIO 
specified (/dev/hp%x) is being reloaded because of a 
fatal error on the board. 


PROBABLE CAUSE 


Examples of fatal errors: bus error, memory parity 
error, address error. 


SUGGESTED ACTION 


None. The on-board code is reloaded automatically. 
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Reloading /dev/hp%x due to Power failure 


MEANING 


This message is sent by /etc/ldhpsio. The HPSIO 
specified (/dev/hp%x) is being reloaded because of a 
power failure. 


PROBABLE CAUSE 


Power failure. 


SUGGESTED ACTION 


None. The on-board code is reloaded automatically. 
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sd: Insufficient DMA page block resources allocated. 


MEANING 


The SCSI driver has detected an insufficient number 
of direct memory access (DMA) page blocks. 


PROBABLE CAUSE 


A new or updated DMA driver has been added to the 
system. 


SUGGESTED ACTION 


Rebuild the kernel, specifying more DMA page blocks. 
See config(1M) in the Administrator Reference Manual 
for a discussion of the parameter dmanpb and for a 
description of how to rebuild and boot a kernel. The 
SPERRY 5000 Series Device Driver Guide also 
discusses DMA page blocks in Chapter 10, "A Virtual 
Disk Driver: Memory Management." 
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sd: Unable to allocate DMA segment registers. 


MEANING 


The small computer system interface (SCSI) driver 
has detected an insufficient number of direct memory 
access (DMA) segment registers. 


PROBABLE CAUSE 


A new or updated DMA driver has been added to the 
system. 


SUGGESTED ACTION 


Rebuild the kernel, specifying more DMA page blocks. 
See config(1M) in the Administrator Reference Manual 
for a discussion of the parameter dmanpb and for a 
description of how to rebuild and boot a kernel. The 
SPERRY 5000 Series Device Driver Guide also 
discusses DMA page blocks in Chapter 10, "A Virtual 
Disk Driver: Memory Management." 
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%s on bad dev %0(8) 


MEANING 


A problem has been detected with a file on a block 
device with a major device number that exceeds the 
number of block device drivers generated by the 
system. 


The string (%S) may be: bad block, bad count, bad 


free count, no space or out of inodes. %o is the minor 
device number. 


PROBABLE CAUSE 


A hardware or software failure. 


SUGGESTED ACTION 


Check new device drivers, modified device drivers, 
and configuration information in the .cf file described 
in config(1M) in the Administrator Reference Manual. 
If you cannot isolate a software failure, a hardware 
failure is indicated. Contact the Support Center. 
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spurious interrupt- trap 24 


MEANING 


An interrupt disappeared while it was being serviced. 


PROBABLE CAUSE 


An unusual hardware condition or a hardware failure. 


SUGGESTED ACTION 


Generate a system dump by following the procedure 
described in config(1M) in the Administrator 
Reference Manual. Check the system description file 
(.cf--also described in config(1M)) for configuration 
information which has changed recently due to the 
installation of new drivers. Make any necessary 
corrections, then rebuild the kernel by following the 
procedure described in config(1M). 
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spurious multi-bus interrupt 


MEANING 


An unexpected interrupt was received from a device 
on the multi-bus. | 


PROBABLE CAUSE 


Something has been added to the system: a new 
driver, an updated driver, and/or new hardware. If 
this is not true, some of your current hardware has 
failed. 


SUGGESTED ACTION 


Reboot the system to see if that clears the problem. If 
this doesn’t work, check the system description file 
(.cf, described in config(1M) in the Administrator 
Reference Manual) for recent changes due to the 
installation of new  drivers--probably drivers 
associated with new hardware. 
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spurious timer interrupt -clk3 


MEANING 


An unexpected interrupt was received from an on- 
board (unused) timer. 


PROBABLE CAUSE 


A new or updated direct memory access (DMA) driver 
has been added to the system. 


SUGGESTED ACTION 


Generate a system dump by following the procedure 
described in config(1M) in the Administrator 
Reference Manual. Check the system description file 
(.cf--also described in config(1M)) for configuration 
information which has changed recently due to the 
installation of new drivers. Make any necessary 
corrections, then rebuild the kernel by following the 
procedure described in config(1M). 
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spurious timer interrupt - trap 228 


MEANING 


An unexpected interrupt was received from an on- 
board (unused) timer. 


PROBABLE CAUSE 


A new or updated direct memory access (DMA) driver 
has been added to the system. 


SUGGESTED ACTION 


Generate a system dump by following the procedure 
described in config(1M) in the Administrator 
Reference Manual. Check the system description file 
(.cf--also described in config(1M)) for configuration 
information which has changed recently due to the 
installation of new drivers. Make any necessary 
corrections, then rebuild the kernel by following the 
procedure described in config(1M). 
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spurious timer interrupt - trap 229 


MEANING 


An unexpected interrupt was received from an on- 
board (unused) timer. 


PROBABLE CAUSE 


A new or updated direct memory access (DMA) driver 
has been added to the system. 


SUGGESTED ACTION 


Generate a system dump by following the procedure 
described in config(1M) in the Administrator 
Reference Manual. Check the system description file 
(.cf--also described in config(1M)) for configuration 
information which has changed recently due to the 
installation of new drivers. Make any necessary 
corrections, then rebuild the kernel by following the 
procedure described in config (1M). 
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stray interrupt at %x 


MEANING 


A device has interrupted through an unexpected 
vector. The vector will be printed in hexadecimal. 
The vector printed is usually the correct value for the 
device, unless it is 0--which is a reserved location. 


PROBABLE CAUSE 


This error can be caused by a device specified at an 
incorrect vector in the .cf file described in config (1M) 
in the Administrator Reference Manual. 


SUGGESTED ACTION 


Check the .cf file for an incorrect vector. Make the 
necessary corrections, then rebuild the kernel by 
following the procedure described in config(1M). If 
the system description file was correct, or the vector 
was 0, suspect hardware problems and contact the 
Support Center. 
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System Nodename: %s 


MEANING 


At system initialization the system nodename is 
displayed. 


Note that UUCP communications uses the name in 
/usr/lib/uucp/SYSTEMNAME, not the nodename 


displayed at system initialization and on the login 
banner. 


PROBABLE CAUSE 


System initialization. 


SUGGESTED ACTION 


None. This is for information only. 
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System Processor: %x 


MEANING 


At system initialization the processor type is 
displayed. 


PROBABLE CAUSE | 


System initialization. 


SUGGESTED ACTION 


None. This is for information only. 
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System Release: %s 


MEANING 


At system initialization, the software release level is 
displayed. 


PROBABLE CAUSE 


System initialization. 


SUGGESTED ACTION 


None. This is for information only. 
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System Version: %s 


MEANING 


At system initialization the software version is 
displayed. 


PROBABLE CAUSE | 


System initialization. 


SUGGESTED ACTION 


None. This is for information only. 
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text bad swap 


MEANING 


The concept of swapping is critical to both swapping 
and demand paging kernels. Exceptions (errors) 
detected during these procedures usually result in 
some type of system panic. 


This message informs you that the kernel has detected 
that the disk block allocated from the map of available 
Swap space is not in the partition designated as swap 
space. 


This indicates that the kernel can no longer manage 


the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 
Generate a system dump and reboot the system by 


following the procedure described in crash(8) in the 
Administrator Reference Manual. 
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TIME OUT ERROR 


MEANING 


The disk driver timed out while awaiting a response 
from the disk controller. 


PROBABLE CAUSE | 


A cable is loose. 


SUGGESTED ACTION 


Check the cables or contact the Support Center and 
arrange to have them checked. 
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--Type any character when ready 


MEANING 


This prompt follows the message prompting you to 
insert a tape in the tape drive to receive the contents 
of a system dump. It requests that you indicate to the 
system (by pressing any character key on the console 
keyboard) when the tape is in its drive and ready to 


go. 


PROBABLE CAUSE 

A system dump you requested while following the 
instructions in crash(8) in the Administrator 
Reference Manual is underway. 


SUGGESTED ACTION 


Press any key and continue to follow the procedure in 
crash(8). | 
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unexpected int: decimal vector no. %x, pc=sy 


MEANING 


This message is sent by the code downloaded to the 
HPSIO. The HPSIO received an interrupt on a vector 
that is not defined. %x is the vector number in 
decimal, and %y is the program counter at the time of 
the error. 


PROBABLE CAUSE 


A hardware or software failure. 


SUGGESTED ACTION 


None. The on-board code is reloaded automatically. 
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user memory = %d 


MEANING 


At system initialization the maximum memory available 
for user programs is displayed. 


PROBABLE CAUSE 


System initialization. 


SUGGESTED ACTION 


None. This is for information only, but you should 
record this value in the system log. 
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WARNING: insufficient swap space to fork non- 
superuser processes 


MEANING 


The kernel could not swap out a process because the 
process size exceeded the swap space remaining. 


This is one of several messages indicating non-fatal 
errors. The condition described is not normal and 
often has already been recovered from by the time the 
message is returned. Sometimes the message serves as 
a warning of an impending failure. 


PROBABLE CAUSE 


The executing kernel was configured with a maximum 
user address space (maxspace) greater than the 
available swap space. This message indicates that any 
attempt to fork a non-superuser process fails. It will 
be encountered only on 32-bit systems which support 
the maxspace parameter. 


SUGGESTED ACTION 


Rebuild the kernel, either specifying a smaller 
process address size or increasing the swap space to 
accommodate the process size. See config(1M) in the 
Administrator Reference Manual for a discussion of 
the maxspace parameter and for a a le of how 
to build and boot a kernel. 


Before increasing the swap space, backup all files that 
have been created or changed since the system was 
last installed. To increase the swap space, follow the 
procedure described in the Jnstallation and 
Verification Guide. 
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WARNING: swap space running out needed %d blocks 


MEANING 


The kernel has found insufficient space on the swap 
device when attempting to swap out a given process or 
a copy of a pure text image. However, it reclaimed 
additional space by freeing space which belonged to an 
inactive "sticky-bit" process. 


This is one of several messages indicating non-fatal 
errors: abnormal conditions which often are already 
recovered from by the time the message is returned. 
Such messages sometime warn of an impending failure. 


PROBABLE CAUSE 


The system has not been tuned to match its workload, 
so the workload is taxing the system’s swap space 
capacity. 


SUGGESTED ACTION 


Either reduce the workload or increase the swap space 
to accommodate the workload. Before increasing the 
Swap space, backup all files that have been created or 
changed since the system was last installed. To 
increase the swap space, follow the procedure 
described in the /nstallation and Verification Guide. 
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xswap %x 


MEANING 


The kernel failed to swap out the currently running 
process and free its memory. $x is the process 
number. 


PROBABLE CAUSE. 


A software and/or hardware failure, 


SUGGESTED ACTION 


Check any new or modified device drivers that have 
not been debugged completely. Make sure you have 
correct configuration information in the system 
description file (.cf--described in config(1M) in the 
Administrator Reference Manual. If neither of these 
proves to be the source of the problem, suspect bad 
hardware and contact the Support Center. 
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0.29 INCH TAPE DRIVER ERROR MESSAGES FROM 
ERRLOG 


These messages are displayed by errpt to describe 
error conditions logged by the tape driver. 


unrecoverable data error on tape: The tape drive has 
determined that the data on the tape are unreadable 
due to something discovered by a cyclical redundancy 
check (CRC) or other errors in the data block. This 
can occur during read or write operations. 


cannot locate block in error on tape: The tape drive 
was not able to confirm that the last block transferred 
was the data block in error. This can occur during 
read or write operations. 


illegal command on tape drive: A command given to 
the tape drive was not valid or was not permitted due 
to a previous command. 


no data detected due to lack of data on a read from 
tape: An attempt was made to read an area of tape 
which has been erased or is beyond the end of 
recorded data. 


timeout on tape read or write operation: The tape 
drive did not complete the requested operation in the 
time allotted. 


tape cartridge not inserted correctly: The tape 
cartridge was removed while an operation was in 
progress. 


tape drive is either not connected or _ not 


selected: The tape drive is not connected to the host 
or has not been selected by the host. 


U8 as an Es 1-119 


5000/20/40/50 


EGR a PPI I SI TE ID EEL SL DOES EE A 


trying to write to a write protected tape: The tape 
cartridge write protection feature has been set to 
SAFE, or the switch which detects the feature is 
malfunctioning. 


recovered data errors reading tape: The tape drive 
required more than eight retries to read data from the 
tape medium. This indicates a bad spot on the tape. 
Replace the tape. 2 


unknown error from tape: The tape drive reported 


an error without status bits or with status bits which 
do not fall into one of the above categories. 
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THE TTY SYSTEM ERROR LOG 


Below is a figure showing an example of an entry in 
the system error log. The text following the figure 
explains each section in detail. 


Date/Time of Processor Port Terminal Incident: 
Fri Apr 18 15:31:49 1986 


Incident Sequence Number 0003 

Subsystem/Module ptob 

Controller Address 00e00105 

Hardware Status Threshold on PARITY Tally Error 
Number of Errors 100 


Mode of Operation at Error Time Speed=9600, Parity=Odd 
1 Stop bit, 7 bits per characte 


Link Signals at Error Time DCD, CTS,DTR, RTS 
Simultaneous Device Activity Channels b,a 


TTY Tallies since last incident on Fri Apr 11 15:11:22 


Input Characters. .0000051735 Frame Errors. ...0000000007 


Input: Errors...... 0000000100 Parity Errors. ..0000000100 
Output Blocks..... 0001085679 Overrun Errors. .0000000000 
Output Errors..... 0000000000 Overflow Errors.0000000008 


Carrier Losses....0000000003 


Figure 1. Sample Error Log Entry 


The Incident Sequence Number is a count of the error 
log entries for this device. In this example, the error 
logged is the third error for the device. 


Subsystem/Module specifies the name of the device 
using the standard hardware naming convention. See 
errpt(1M) for legal names. 


Controller Address is the address of the controller for 
this device. As another example, if this error had 
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been on device tt00 on the first HPSIO board, the 
controller would have been listed as 001f1400. 


Hardware Status spells out the type of error being 
logged. Most errors that the tty system encounters 
are not logged individually: they are simply counted. 
These error counts are referred to as tallies. When a 
tally becomes equal to a value (called a threshold) 
specified either by default or by a value you enter, 
the system creates an error log entry. Tally 
threshold logs display the value of all tallies that the 
driver is currently keeping for the channel in 
question. Individual tallies are kept for each error 
and for each channel. Tally thresholds can be set 
individually for each kind of error and for each 
channel. 


The list below gives the types of tally records that are 
logged. The tallies for input and output characters 
are noterrors. 


e Threshold on Input Tally 


Records of this type are logged when the number of 
characters received reaches the threshold. 


e Threshold on Input Error Tally 


A record of this type is logged when input errors of 
all kinds reach the threshold. 


e Threshold on Output Tally 


Output tally records are logged when the number of 
characters sent reaches the threshold. | 


Threshold on Output Error Tally 


This tally is not used by the tty system. It tallies 
output errors for synchronous communications 
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packets. 


Threshold on Frame Errors 


Framing errors occur when the channel receives a 
character that does not have the correct number of 
start and/or stop bits. When framing errors reach 
the specified threshold, a record of this type is 
logged. 


Threshold on Parity Errors 


Parity errors occur when a data bit is changed 
(binary) during transmission. When the number of 
parity errors reaches the threshold for a channel, a 
log entry of this type is created. 


Threshold on Overrun Errors 


Overrun errors occur when data is received faster 
than the controller can read it from the USART. 
This condition usually can be cleared up by 
establishing a flow control protocol between the 
transmitting device and the receiver. 


Threshold on Overflow Errors 


Overflow errors occur when the receiving system’s 
buffer becomes full and the transmitting system is 
not honoring any flow control protocol. The 
consequence of an overflow error is that the entire 
buffer in the driver is discarded and data is lost. 
This situation can usually be corrected by 
establishing a flow control protocol that both the 
receiver and the transmitter will honor. The 
threshold for these errors usually is one (1). 


Threshold on Carrier Loss 


This tallies the number of physical disconnects that 
have occurred. 
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Number of Errors displays the tally value at the time 
the record was logged. This is also the threshold 
value for the error on the channel. 


Mode of Operation at Error Time describes the 
physical characteristics of the line when the error 
occurred. These characteristics include line speed, 
parity, number of stop bits, and character size. Line 
speed is given as baud rate (see terminfo(4) for legal 
speeds). Parity is even, odd or none. Number of stop 
bits is 1, 1.5 or 2. The character size can be 5, 6, 7 
or 8 bits. 


Link Signals at Error Time displays the RS-232-C 
signals that were active when the error occurred. 
The signals are Request to Send (RTS), Carrier 
Detect (CD), Data Set Ready (DSR), Clear to Send 
(CTS) and Data Terminal Ready (DTR). 


Simultaneous Device Activity displays the names of 
other channels on the same controller that were active 
when the error occurred. 


Finally, the section labeled TTY Failure since the last 
incident on... gives the last previous date that an 
error for this channel was logged. It then prints the 
current values of all the tallies for this channel. All 
tallies are shown--even if they have not reached their 
thresholds. 
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The messages in this section of the book are in 
alphabetical order. Some refer you to other messages 
for more complete information in order to eliminate a 
cumbersome repetition of information. 


At the end of this section, there is a complete list of 
DISK/TAPE messages. Other messages in the text 
deal with disk and tape problems, but this list 
contains all the codes you might see. 


Use the table of contents at the beginning of this 
section to find the page number of a message you want 
to research. 
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addr regs: %x %x %x x Sx %x %x %x 


data regs: x %x %x %x %x x %x %x 

User Trap Type %s, pe = %x, usp = %x, fault = %x 
logical address that generated fault %x 

process name that was running = %s 

process id = %xh 

pe = %x, usp = %x, sr = %x 

User Mode, 

System Mode, 

trap type %s, code = %x, fault register = %x 


MEANING 

These ten messages are returned together when the 
kernel encounters a system or bus error. The kernel 
collects and displays this information. 


PROBABLE CAUSE 


System or bus error. 


SUGGESTED ACTION 


Use the contents of these error messages to help you 
find and correct system problems. If you can not 
correct a problem, contact the Unisys Support Center 
or your Unisys representative. 
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avail mem = %d 


This informative message is returned as part of the 
the pair: 

realmem = %d 

avail mem = %d 

MEANING 

Real memory is the number of bytes of physical 
memory on the CPU. Available memory is the number 


of bytes of physical memory actually available to user 
processes. 


PROBABLE CAUSE 


None. This message is for information only. 


SUGGESTED ACTION 


No action is required, but you may want to record 
these figures in the system log for future reference. 


cannot find /etc/slave file 


MEANING 


In a multi-processor system the kernel expects to find 
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cannot find /etc/slave file 


MEANING 

In a multi-processor system the kernel expects to find 
the code and data for the slave processors in the file 
/etc/slave20. When the kernel fails to find it there, it 
returns this message. 


PROBABLE CAUSE 


Missing /etc/slave20 file 
SUGGESTED ACTION 


Install /etc/slave20 file in /etc directory. The file 
should be available from the /usr/sys directory. 
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cannot read /etc/slave file 


MEANING 


In a multi-processor system, the kernel checks the 
contents of the file /etc/slave20. This message means 
that, once the kernel found the file, it could not read 
Lites 


PROBABLE CAUSE 


This could indicate file system corruption. The file 
itself is in core image format, so there are no data 
structures in the file that the kernel examines to 
verify that the file is correct. This means the kernel 
actually had a physical problem reading the file 
(probably due to disk failure), not that it found 
incorrect contents in the file. It does not imply that 
there has been a check of the contents of the file: it 
means only that for some reason the kernel could not 
read it. 


SUGGESTED ACTION 
Check the contents of the file. If you can find no 


problem there, shut down the system and reboot. If 
the problem persists, contact the Support Center. 
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can’t open rp(%d) 


MEANING 


The driver cannot open the reserved area of the disk. 


PROBABLE CAUSE 


Such a failure to open a logical volume usually means 
the logical volume is off-line. It could also mean that 
the disk is not ready, there is a bad timing register or 
the node has been removed. 


SUGGESTED ACTION 


Make sure the disk is on-line, accessible and has its 
cable connected properly. If these check out, try 
switching cables to a different drive. If this drive is 
accessible, the first drive is bad. Shut the system 
down and reboot. If the problem persists, contact the 
Support Center. 
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check bits = xh syndrome bits = %xh byte 
select = %xh 


SEE: memfault erraddr = xh %s %s %s 
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emos clock chip not responding 


MEANING 

To keep more accurate track of time, C68020 boards 
have a real time clock on them. Communication with 
this chip did not proceed correctly. 

PROBABLE CAUSE 

There are three possibilities: 

E the clock chip is not present 


¢ the chip is not functioning 


e the battery is drained. 


SUGGESTED ACTION 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. 
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coproc violation 


MEANING 


NOTE: This is for MC68020 boards specifically.) 
Communications between 68020 microcode and the 
68881 (the Floating Point Processor--FPP) resulted in 
confusion between the two processors. This message 
indicates that the FPP encountered confusion because 
the normal protocol was not being observed by the 
68020 microcode. 


PROBABLE CAUSE 


Hardware problem. 


SUGGESTED ACTION 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. 


2-8 UP-12218 


Error Messages 


%d processor system 


MEANING 


This message, displayed during system initialization, 
identifies how many CPUs are contained in your 
system. 


PROBABLE CAUSE 


System initialization. 


SUGGESTED ACTION 


None: this is for information only, but it should be 
recorded in your system log. 
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DANGER: mfree map overflow %x lost $d items at %d 


MEANING 


One of the tables mapped through the system’s malloc 
mechanism has overflowed. (See malloc(3C) in the 
Programmer Reference Manual.) %x gives the address 
of the table. By searching for this number in the 
system namelist, you can discover the name of the 
malfunctioning map (Swapmap, msgmap or semmap). 
You can also use the crash command (see crash(1M) in 
the Administrator Reference Manual) to find the map’s 
name. 


PROBABLE CAUSE 


This typically results from fragmentation of the 
resource managed with the map array. 


SUGGESTED ACTION 


Increase the number of entries currently allocated for 
the appropriate map in the system description file .cf. 
See config(1M) in the Administrator Reference Manual 
for a discussion of the parameters swapmap, msgmap 
and semmap, and for a description of how to build and 
boot a kernel. 
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DANGER: out of swap space needed %d blocks 


MEANING 
This message follows the message 
DANGER: mfree map overflow %x lost $d items at $d 


and informs you that the kernel has exhausted all 
available swap space. This condition may lose data 
being swapped and/or crash the system. It also means 
that the system may begin operating incorrectly. 


PROBABLE CAUSE 


This typically results from fragmentation of the 
resource managed with the map array. 


SUGGESTED ACTION 


Increase the number of entries currently allocated for 
the appropriate map in the system description file .cf. 
See config(1M) in the Administrator Reference Manual 
for a discussion of the parameters swapmap, msgmap 
and semmap, and for a description of how to build and 
boot a kernel. | 
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dataregs: x %x %x %x %x %x %x %x 


SEE: addr regs: %x %x %x %x %x %x %x %x 
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Dev start error 


MEANING 


Some of a raw I/O process request has_ been 
completed, but it cannot complete. dev start is 
returning an error because: 

° it could not get an entry from the free list, or 


¢ it could not lock the tas bit, or 


e it did not have a proper request setup. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Shut down and reboot the system. 
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Disk controller time out. Disk sync not completed. 


MEANING 


If an uninterruptible power source (UPS) is present 
during a power failure, the kernel will attempt to 
update the contents of its buffers to the disks. The 
kernel grants the controller a reasonable amount of 
time to complete this. If the controller fails to respond 
within this time, the kernel assumes that the high 
speed disk/tape controller (HSDT) is bad, or that 
something has happened (hardware or software 
failure) that is preventing the driver from 
continuing. The kernel shuts down the. update 
attempt and continues the power fail processing, 
which consists at that point of turning off the UPS. 


PROBABLE CAUSE 


Power failure. 


SUGGESTED ACTION 


There is nothing you can do at this point. Some data 
has been lost: you will be able to find the extent of 
the loss when you reboot the system. The file systems 
are likely to be corrupted and should be repaired with 
fsck(1M) (see the Administrator Reference Manual). 


Contact the Support Center to see if your HSDT needs 
replacement. 
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disk empty free list 


MEANING 


After an interrupt routine, the master CPU (MCPU) 
has taken a request off the response queue and placed 
it on the free list. The MCPU looks at the buffer 
pointer and decides whether it was a disk or a tape 
request, then performs one final check of the free list 
to make sure the item got onto it. During this check, it 
fails to see the item it just placed on the list and 
concludes something is wrong. 


_ PROBABLE CAUSE 


Usually a hardware problem. 


SUGGESTED ACTION 
Shut down the system and reboot. If the problem 


persists, contact the Support Center: the board may 
have to be replaced. 
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Disk syne completed 


MEANING 


This occurs during a power failure when an 
uninterruptible power source (UPS) is present. It 
tells you that the kernel has finished writing all of its 
buffers to disk. 


PROBABLE CAUSE 


Power failure. 


SUGGESTED ACTION 


None. This provides information only. 
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dk(%d) blk(%d) re(%xh) intword(%xh) %s 


dk(%d) blk(%d) re(%x) intword(%x) 


MEANING 


These are disk error messages. The variables 
translate as follows: 


dk(%d) is the disk number (minor device number) 
blk(%d) is the block number on that device 


rc(%xh) is the return code--a hexadecimal number 
that indexes into a string table in dkK.s 


intword is the interrupt word (usually zero) that 
contains the contents of the interrupt code received 
from the HSDT. The table below lists its possible 
values. 


%S (in the first code only) explains the return code. 


Possible values of intword 


system sector bad 

illegal disk/tape command 
invalid sector number 

disk not formatted 

invalid system sector information 


drive not ready 
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physical drive not mounted 
no such logical drive 
phy/log drive out of range 
invalid data in request 

no partial block r/w 

skip track table is full 
skip track error 

hit double alternate sector 
too many sectors 

hit alternate sector 

data CRC error 

SEEK error 

ready change 

cannot rezero 

0 

header id error 

end of cylinder 

overrun 

no data 


not writeable 


Ze AG 
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e header search 

¢ time out on disk r/w 
¢ wrong gap length 


e abnormal termination 


PROBABLE CAUSE 


Some kind of disk error such as a bad spot in the disk. 


SUGGESTED ACTION 


Contact the Support Center. 
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| 

i | 

Double panic: %s 
| 


MEANING WY 
This message is displayed if, while the system is 
processing a panic, a second panic occurs. | 

PROBABLE CAUSE 

Catastrophic system failure that, prevents normal 

panic processing, possibly including operations such 

as error logging or terminal I/O. 

SUGGESTED ACTION 

Shut down the system (if necessary) and reboot, 


folowing the instructions in config(1M) in the 3 
Administrator Reference Manual. Ww 


2-20 UP-12218 


Error Messages 


format violation 


MEANING 


This applies only to the 68020, and means an attempt 
was made to restore an invalid stack frame to the 68881 
(floating point processor) coprocessor. 


PROBABLE CAUSE 


Hardware or software failure. 
SUGGESTED ACTION 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. 
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gclp_interrupt(XXX, YYY) 


MEANING 

The general communications (gc) lock routine is a 
general purpose lock routine that sets up a 
semaphore. 

This message informs you that the gclp driver has 
received an illegal interrupt (XXX) from gcp board 
be ae 

NOTE: Every module has an interrupt--a 16-bit 
register in which 3 bits are hardware related and the 
remainder. are software definable. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Debug the error. 
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gc_interrupt(XXX, YYY, ZZZ) 


MEANING 

The general communications (gc) lock routine is a 
general purpose lock routine that sets up a 
semaphore. This message informs you that the gcp 
board (ZZZ) interrupted the master CPU renee with 
an illegal interrupt (YYY). 

NOTE: Every module has an interrupt--a 16-bit 


register in which 3 bits are hardware related and the 
remainder are software definable. 


PROBABLE CAUSE 
There are three places the problem could originate: 
e the inter-process communication bus (ICB) 


e the contents of the hardware interrupt register on 
the MCPU | 


e the board that generated the interrupt. 


SUGGESTED ACTION 


Debug the error. 
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gc_lock timeout id XXX board YYY lock ZZZ 


MEANING 

The general communications (gc) lock routine is a 
general purpose lock routine that sets up a 
semaphore. 

This message informs you that the master CPU (MCPU) 
failed to lock resource ZZZ on board YYY. After 1K 
tries the MCPU gave up, leaving the resource in state 
XXX. 

PROBABLE CAUSE 

This normally represents a_ software problem, 
although the mechanism uses a hardware assist. There 
are three conditions which will hang the board: 

e the master CPU (MCPU) owns it 


e the slave CPU aie it 


e the board has an error condition. 


SUGGESTED ACTION 


Debug the error. 
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§c_proc(XXX, YYY) address error 


MEANING 

The general communications (gc) lock routine is a 
general purpose lock routine that sets up a 
semaphore. The routine also contains a check for a 


valid address, which in this case it has failed to find. 
The invalid address passed is YYY and XXX is the 


command gc_ proc was trying to send. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Debug the error. 
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gc_tas_reset id XXX board YYY 


MEANING 


The test and set (tas) bit identifies resource 
availability. The MC68000 processor uses. an 
uninterruptible read/modify/write cycle which allows 
information to be accessed, modified and rewritten 
while insuring that the resource is not used by 
another device at the same time. In a multi-processor 
system this cycle must be emulated, and the tas bit 
helps accomplish this. The general communications 


The (gc) lock routine is a general purpose lock 
routine that sets up a semaphore. 


In this case, the master CPU (MCPU) tried to release 
the tas resource on gcep board YYY and found that the 
resource was in state XXX and that the MCPU did not 
own it as expected. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Debug the error. 
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gc_tas_setiderror0: tas XXX id YYY board ZZZ 


MEANING 


The test and set (tas) bit identifies resource 
availability. The MC68000 processor uses’ an 
uninterruptible read/modify/write cycle which allows 
information to be accessed, modified and rewritten 
while insuring that the resource is not used by 
another device at the same time. In a multi-processor 
system this cycle must be emulated, and the tas bit 
helps accomplish this. 


The (gc) lock routine is a general purpose lock 
routine that sets up a semaphore. 


In this case, the master CPU (MCPU) has found tas 
resource XXX in an illegal state YYY on gcep board 
ZZZ. bit. The bit should either be free or the slave 
CPU should own it. Neither of these conditions was 
true, and the MCPU returned this message. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Debug the error. 
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gc_tas_setiderrorl: tas XXX id YYY board ZZZ 


MEANING 


The test and set (tas) bit identifies resource 
availability. The MC68000 processor uses an 
uninterruptible read/modify/write cycle which allows 
information to be accessed, modified and rewritten 
while insuring that the resource is not used by 
another device at the same time. In a multi-processor 
system this cycle must be emulated, and the tas bit 
helps accomplish this. 


The (gc) lock routine is a general purpose lock 
routine that sets up a semaphore. 


This message informs you that the gc_tas_set routine 
found the tas resource XXX in an illegal state YYY on 
gcep board ZZZ. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Debug the error. 


9-98 UP-12218 


Error Messages 
a eS NE eR A Ee RTE SEALY PO RST ee Te A Te ee TTY 


gc_tas_set timeout id XXX board YYY 


MEANING 


The test and set (tas) bit identifies resource 
avaiUability. The MC68000 processor uses an 
uninterruptible read/modify/write cycle which allows 
information to be accessed, modified and rewritten 
while insuring that the resource is not used by 
another device at the same time. In a multi-processor 
system this cycle must be emulated, ang the tas bit 
helps accomplish this. 


The (gc) lock routine is a general purpose lock 
routine that sets up a semaphore. 


In this case, the master CPU (MCPU) has tried 10K 
times to lock the tas resource on board YYY and 
cannot because the slave owns it. The MCPU gives up 
with the resource in state XXX. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Debug the error. 
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gcetty_icb dev = XXX ABEND 


MEANING 


The (gc) lock routine is a general purpose lock 
routine that sets up a semaphore. 


Similar to gctty_tasint dev %d, this message states 
that the gc I/O board has encountered aa hard failure 
on tty dev XXX. 

NOTE: Every module has an interrupt--a 16-bit 


register in which 3 bits are hardware related and the 
remainder are software definable. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Debug the error. 
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gctty_interrupt(XXX, YYY) 


MEANING 


The (gc) lock routine is a general purpose lock 
routine that sets up a semaphore. 


This message informs you that the getty driver has 
received an illegal interrupt (XXX) from gcep board 


Sep a 


NOTE: Every module has an interrupt--a 16-bit 
register in which 3 bits are hardware related and the 


remainder are software definable. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Debug the error. 
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gctty_tasint dev XXX 


MEANING 


The test and set (tas) bit identifies resource 
availability. The MC68000 processor uses an 
uninterruptible read/modify/write cycle which allows 
information to be accessed, modified and rewritten 
while insuring that the resource is not used by 
another device at the same time. In a multi-processor 
system this cycle must be emulated, and the tas bit 
helps accomplish this. 


The (gc) lock routine is a general purpose lock 
routine that sets up a semaphore. 


If the ge I/O board detects an error in the tas 
resource on tty dev XXX, it reports the condition to 
the master CPU and sends this message to the console. 
NOTE: Every module has an interrupt--a 16-bit 
register in which 3 bits are hardware related and the 
remainder are software definable. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Debug the error. 
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gen tapeerror: %s 


MEANING 


An error was returned from the high speed tape/disk 
controller (HSDTINT) on a tape request. 


%s will be one of these values: 
e drive not ready 
¢ illegal disk/tape command 


¢ invalid data in request 


PROBABLE CAUSE 


Hardware failure. 


SUGGESTED ACTION 


Contact the Support Center. 
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hsdt hung 


MEANING 
This is usually a result of the condition causing the 
tas message. The master CPU (MCPU) cannot free the 


tas bit, so it assumes the high speed disk/tape 
(HSDT) board is hung. 


PROBABLE CAUSE 


Hardware or software failure. 
SUGGESTED ACTION 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. 
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hsdtint tape cmd = %x 


MEANING 
This is the same as the message: 
invalid hsdtint tape cmd = %x 


only in this case it applies to a nine track controller. 


PROBABLE CAUSE 


This indicates either a hardware problem or an invalid 
driver command. 


SUGGESTED ACTION 


A hardware problem will necessitate replacement of 
the board. An invalid command could mean corrupted 
data or a device driver problem. Contact the Support 
Center. 


See the entry for tape error: rce(%x) for a list of 
return code possibilities. 
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sss =r 


iaddress > 2°24 


MEANING 


When updating a file’s i-node on the file system, a 
block number in the i-node was found to be larger 
than is permissible. 3 


PROBABLE CAUSE 


This is usually due to operator error. The system is 
not being brought down correctly. 


It can also be generated by new device drivers that 
have not been completely debugged. | 


SUGGESTED ACTION 


To check the state of any file system you think is 
corrupted, unmount the file system and use fsck(1M) 
on it. See fsck(1M) in the Administrator Reference 
Manual. If you suspect the root file system is 
corrupted, you will have to enter single user mode to 
check it. 


If none of the above attempts works, the error can 


also be attributed to a disk drive and/or controller. 
Contact the Support Center. 
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improper config -- kernel halted 


MEANING 


The kernel and the boot code are interrelated to a 
certain extent: the kernel expects to boot off of a 
version of the boot code that has a certain set of 
capabilities for that kernel. This is a message stating 
that the kernel has not found the proper code and has 
halted. 


PROBABLE CAUSE 

Often this will result if the kernel finds an extremely 
old boot code image on the reserved area. 

SUGGESTED ACTION 

Check to make sure that boot image matches the kernel 
and that the kernel has been generated correctly. If 


you're sure these are okay, contact the Support 
Center. 
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Inode table overflow 


MEANING 


The system i-node table has overflowed and a new file 
could not be accessed. See open(2), create(2), 
access(2) and stat(2) in the Programmer Reference 
Manual. 


PROBABLE CAUSE 


The system was not created with a large enough i-node 
table to support the maximum number of open files on 
the system. 


SUGGESTED ACTION 


Increase the number of entries currently allocated for 
the i-node table (i-nodes) in the system description 
file (.cf--described in  config(1M) in_ the 
Administrator Reference Manual. Generate and boot a 
new system. 
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Insufficient main memory. System halted 


MEANING 

As the kernel is booting it dynamically allocates 
memory. When you see this message, you are being 
informed that the kernel did not find enough memory 
(based on its configured size) to complete the boot. 
The kernel has stopped the boot and shut down. 
PROBABLE CAUSE 


Insufficient main memory in system. 


SUGGESTED ACTION 


Either the kernel must be reconfigured, or your 
system must have more memory boards added. 
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invalid hsdtint tape cmd = %x 


MEANING 
The high speed disk/tape controller board 
(HSDTINT) returns this error. The system sent an 
invalid command to the cartridge tape drive 
controller. The %x string will give the hexadecimal 
number of the command the HSDTINT board receives. 
This message is accompanied by another message, 
tape error: rc(%x). The re stands for return code, 
where %x is a hexadecimal value which is offset into.a 
string table. The message: 

tape error: %s 
provides the explanation of the hex value. See the 
entry for tape error: rce(%x) for a list of the possible 
values for the string %s. 


PROBABLE CAUSE 


Hardware failure. 


SUGGESTED ACTION 


Contact the Support Center. 
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logical address that generated fault 4x 


~ SEE: addr regs: %x %x %x %x %x %x %x %x 
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lost process = %s 


MEANING 
This identifies a process that was lost due to an 
interrupt routine. It is returned with other messages 


which indicate the specific nature of the problem. 


SEE: Slave cpu in slot $x dead and Timeout on rung 
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mac %xh bus error addr = %xh pe = %xh 


MEANING 


An address in the multi-bus adapter card (MAC) 
address space that is valid has been accessed and 
generated an error. This indicates that something is 
out of the ordinary. The message implies a bus error 
in an access of a location in multibus space, not in the 
shared RAM area. 


The variables provide this information: 


e mac %xh--the number of the multi-bus adapter card 
(MAC). It will be 0 through 7: card 0 is on the far 
right, and card 7 is on the far left. 


e bus error addr = %xh--is the address’' the 
processor puts on the stack when the bus error 
occurs that is the cause of the bus error. This will 
be in the MAC address space. 


° pe = %xh--is the address of the instruction that 
caused the bus error. 


The printout is designed to help people who are 
developing drivers for MAC card hardware identify 
where they are getting bus errors and other 
problems. This is accompanied by the message: 


user = %x 
which identifies the U page number (the physical page 
number) of the data structure that describes the user 


process. It comes out above the messages describing 
the user bus error in more detail. 
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PROBABLE CAUSE 


Bus error during an access of an address in the 
multi-bus portion of the MAC address space, not in 
the shared RAM portion of that space. Possible failure 
of multi-bus card. 


SUGGESTED ACTION 
Your driver should properly handle the bus error and 


continue running. Troubleshoot the multi-bus card 
and contact the Support Center. 
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memfault erraddr = xh %s %s %s 

check bits = %xh syndrome bits = %xh byte 
select = $xh%s pmb_priority = %xh 

MEANING 

These three messages are returned together after the 
kernel has detected a memory fault (correctable or 


uncorrectable). They can also indicate a bus error. 


In the first message, the strings will consist of the 
these values: 


e xh 
this is the error address 
e $s #1 
fi "W ! ft 
code access’ or ' data access 
e Ss #2 
Y vw ! "! 
user access or system access 
e Ss #3 
Nt © 2 A + ih 
read-modify-write’ or “read 


The second message gives debug data (check bits, 
syndrome bits, and yte select). 


If there is a serious, persistent problem, the string 
(Ss) beginning the third message will be either 
uncorrectable or single bit. The value of pmb_priority 
(xh) will be a hex number indicating the primary 
memory bus priority at the time of the fault. 
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PROBABLE CAUSE 

Possibly a memory RAM chip failure. This will 
necessitate replacing the board. 

SUGGESTED ACTION 

Record this information in the system log. When 
regular system operation continues, the contents of 


this message will be useful if the problem recurs. If 
the problem persists, contact the Support Center. 


2-46 UP-12218 


| Error Messages 
ARGS SEE TE SI SESE SB LRN REE LT I RSS RESIS SR DENIES A BOER ERO STEERS LOE AT A OTE 


memfree %x 


MEANING 


The kernel has tried to free a physical page in memory 
with an address of zero. This is a common failure mode 
for the kernel to get into when it is having problems, 
and this routine helps detect the pare mode more 
immediately. 


PROBABLE CAUSE 

If you get this message, the system is very suspect at 
the time: the kernel is essentially garbled, although it 
can limp along for minutes at a time in this condition. 
Probable hardware or software failure. 


SUGGESTED ACTION 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. | 
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No Archive tape drive on this controller 


MEANING 

An attempt was made to write to a tape drive, but 
apparently none was configured on the controller used 
to issue the command. 

PROBABLE CAUSE 

No archive tape drive is configured, or a hardware or 
software failure is returning the error. 

SUGGESTED ACTION 

Connect an archive tape drive and reissue the 


command, or reissue the command through another 
controller with an archive tape drive configured. 
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no interrupt routine 


MEANING 


The kernel received an interrupt from one of the auto 
vectors, went out and polled the boards that should be 
responsible for that interrupt, and found no board 
that claimed to have generated the interrupt. 


PROBABLE CAUSE 

Hardware problem. If this message is returned 
because of a hardware failure, you are more likely to 
see it at boot time. But, since it deals with 
communications, it can occur anytime. 

SUGGESTED ACTION 


Shut down the system and reboot. If this does not 
clear the problem, contact the Support Center. 
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No nine track tape drive on this controller 


MEANING 


If a cartridge tape drive is being accessed as a nine 
track tape drive, this message is returned. 


PROBABLE CAUSE 


Configuration error. 


SUGGESTED ACTION 
Reconfigure the device mode under the /dev directory 


to properly reflect the hardware configuration for 
your system. 


2-50 UP-12218 


Error Messages 
SS NT a TSR ONE RD RIO WAS IER ORR eee oe gees ore ee Ty 


no partial block read/write 


MEANING 


A request was made to read an unauthorized block 
size. Blocks must equal one or 10 Kbytes: they cannot 
be fractions of these amounts or the system will refuse 
to operate on them. 


PROBABLE CAUSE 

This often results when a user-written program 
requests an unauthorized block size. The message can 
also be returned because of corrupted data. 
SUGGESTED ACTION 

Check the program that was running when the error 


was encountered and make sure it uses authorized 
block sizes. Correct the program if necessary. 
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No swap space for exec args 


MEANING 


During exec processing there is insufficient ae 
space to ‘temporarily hold the passed arguments. A " 
memory" error is returned to the caller. 


PROBABLE CAUSE 


The system has not been tuned to match its workload, 
so the workload is taxing the system’s swap space 
capacity. 


SUGGESTED ACTION 


Either reduce the workload or increase the swap space 
to accommodate the workload. Before increasing the 
swap space, backup all files that have been created or 
changed since the system was last installed. To 
increase the space, follow the procedure described in 
the Installation and Verification Guide. 
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no file 


MEANING 


The kernel has detected that the open file table is full 
and a new reference to a file has failed. 


PROBABLE CAUSE 


The system has not been tuned to match its workload, 
and the workload is now taxing the system’s open file 
table capacity. | 


SUGGESTED ACTION 


Reduce the workload or rebuild the kernel specifying 
a larger open file table. Check files in the system 
description file .cf. The full pathname is 
usr/sys/.cf/system. 


See config(1M) in the Administrator Reference Manual 


for a discussion of the parameter files and a 
description of how to build and boot a kernel. 
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nothing on response que 


MEANING 


The master CPU (MCPU) has received an interrupt 
indicating that a request is waiting on the response 
queue. It checked the queue and found no request. 
This is an extra error check the MCPU performs. The 
condition should never occur. 


PROBABLE CAUSE 


Hardware failure. 


SUGGESTED ACTION 


If the processor is hung, shut down the system and 
reboot. Contact the Support Center if this problem 
persists: it may be necessary to replace the. disk 
controller. | 
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open tape error: write protected 


MEANING 


Returned when you first attempt to open a nine track 
or cartridge tape drive in write mode, but the drive is 


write protected. 


PROBABLE CAUSE 


Write-protected drive or tape drive problems. 


SUGGESTED ACTION 


Disable the drive’s Write Protect switch. 
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out of text 


MEANING 


The kernel has detected that an attempt to allocate a 
shared text structure failed because there isa lack of 
available structures. 


PROBABLE CAUSE > 


The system has not been tuned to match its workload, 
so the workload is taxing the system’s text table 
capacity. 


SUGGESTED ACTION 


Either reduce the workload or rebuild the kernel 
specifying a larger text table to accommodate the 
workload. See config(1M) in the Administrator 
Reference Manual for a discussion of the parameter 
texts, and for a description of how to build and boot a 
kernel. 
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~ panic: 


MEANING 


This is a general prefix attached to some messages. It 
tells you that the kernel cannot deal with the situation 
it is encountering, and means the system is going 
down. 


When the kernel encounters a panic situation, it 
attempts to sync the disks, then loops. If, in the. 
process of the disk sync, another panic is generated, 
the kernel will print the Double Panic string on the 
console. 


In any case, the kernel will enter an uninterruptible 
tight loop, forcing a reboot. 


Here is a complete list of the messages that may be 
returned with a panic prefix, with a short description 
of each. The messages themselves are examined at 
greater length at their appropriate places in the text 
of this manual. 


° no fs 
The kernel has attempted a consistency check of 
the in-core free-block and i-node counts but could 


not find the in-core super block. 


Timeout table overflow 


A timeout is called to arrange that a function is 
called within a specific time frame. The panic 
routine is called if the entry won’t fit in the timeout 
table. THIS IS A CONFIGURABLE PARAMETER. 


e no imt 
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A file system known to be mounted was not found in 
the mount table. 


no mem for slave 


During system initialization, UNIX determined it 
did not have sufficient memory to boot a slave 
processor. 


linit 


iinit is called once very early in initialization. It 
reads the root’s super block and initializes the 
current date from the last modified date. The panic 
routine is called if iinit cannot read the super 
block. This usually is the result of a hardware 
problem. 


lost text 

A pointer to the text of a process was. lost. 

no procs 

This message should occur only during system 
initialization, if for some reason the kernel was 
unable to make a process slot entry for either init or 
the swapper. This usually indicates a hardware 
failure. 


ill attempt to unlock rung 


An attempt has been made to unlock the run queue, 
only to find that the queue was not locked. 


ill attempt to unlock tas 


An attempt has been made to unlock the test and set 
semaphore, only to discover that it was not locked. 
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e exit on proc 0, 1, or 2 


Processes 0, 1, and 2 should never call exit(). If 
one or more of them does, this panic will occur. 


eno uarea 


An attempt was made to free the user memory area. 
No process claimed to ee using the area, yet the 
area was in use. 


e bflush: bad free list 


During the unmount or syncing of a file system, the 
in-core buffers must be written to disk. bflush() is 
the routine called to do this. If the routine 
encounters an eRLBEMELY: bad pointer, it issues this 
panic. 


e process raw error 


This is a catch-all panic message that indicates 
procraw() has a problem getting either its tables or 
its memory management routines in order. This will 
be seen whenever raw I/O has a serious problem. It 
usually indicates a hardware problem. 


Several panic messages are returned by memory 
management software: 


e bad mem free-list 
The counter used to keep track of free memory has 


gone negative. This is returned by the memall 
routine, which allocates memory. 
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e lost mem 


While in the process of allocating memory, the free 
memory counter has gone negative, a serious 
condition. This is returned by the memall routine, 
which allocates memory. 


e dup alloc 


Duplicate memory pages have been allocated. This 
is even more serious than the previous message. It 
is returned by the memall routine, which allocates 
memory. 


e bad mem free 
The kernel is trying to free memory that is out of 
range. This is returned by the memfree routine, 
which frees memory. 

e dup free 
The kernel is trying to free a duplicate page. This 


is returned by the memfree routine, which frees 
memory. 


- SUGGESTED ACTION 


For any of these messages, the best course of action is 
to shut down the system and reboot. If any of the 
conditions causing these errors persist, contact the 
Support Center. 
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panic: bad mem free 


MEANING 
The kernel has attempted to free a memory page that is 
outside the range of physical memory determined to be 
valid during system initialization. 

This indicates that the kernel can no longer manage 


the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 
Shut down and reboot the system using the procedure 


described in config(1M) in the Administrator 
Reference Manual. 
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panic: bad mem free-list 


MEANING 

The kernel has detected that the count of available 
memory pages has gone negative, indicating that more 
memory has been allocated than should be available. 
This indicates that the kernel can no longer manage 


the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 


Shut down and reboot the system. 
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panic: bflush: bad free list 


MEANING 

The linked list of free I/O buffers is corrupted. The 
processor has halted. 

PROBABLE CAUSE 


Hardware or software problems. 


SUGGESTED ACTION 


Shut down and reboot the system. Check any device 
drivers that have not been debugged completely. 
Make sure that the configuration information in the 
system description file (.cf--see config(1M) in the 
Administrator Reference Manual) is correct. 


If none of these attempts succeeds in clearing the 


problem, suspect bad hardware and contact the 
Support Center. : 


UP-12218 2-63 


5000/60/80/90 


RCSA BASE EE TERETE TIED IS I ET ELIE LTE SEE EVEL LODE OLLD ELLE LE NESTLE CEL EE ELLE AOE EAE 


panic: dup alloc 


MEANING 

The kernel has detected that the memory allocation 
algorithm, which hashes to a free memory page, has 
yielded an already allocated page. 

This indicates that the kernel can no longer manage 


the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 


Shut down and reboot the system. 
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panic: dup free 


MEANING 


An attempt has been made to free an already freed 
memory page. 


This indicates that the kernel can no longer manage 
the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


None. 


SUGGESTED ACTION 


Shut down and reboot the system. 
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panic: exit on proc 0,1, or 2 


MEANING 

There are three system processes (0, 1 and 2--the two 
swappers and the init process) which must be running 
for the kernel to run. Under no normal circumstances 
should one of these processes exit. If one of them does 
exit, you get this message, and the system halts 
itself. 

PROBABLE CAUSE 


Hardware or software failure 


SUGGESTED ACTION 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. 


2-66 UP-12218 


Error Messages 
SS ES TI RTO Ta ES IE IS NET GIS OOS OE GI EI EAE TIEN SG RR nS RR aR 


panic: liinit 


MEANING 


During system initialization, the kernel cannot read 
the super-block for the root file system. 


PROBABLE CAUSE 


A media hardware error (for example, a bad block on a 
disk) or the kernel is configured for a root device 
which does not exist. 


SUGGESTED ACTION 


If you have just rebuilt the kernel by following the 
procedure in config(1M) in the Administrator 
Reference Manual, then you can reboot the system in 
manual mode by using /unix.old as the kernel. When 
the system is operational, check the system 
description file (.cf, also described in config(1M)) for 
errors. 


If you have not rebuilt the kernel, contact the 
Support Center for assistance in determining the 
status of your disk. Your disk may have to be 
reformatted. ; 


UP-22219 2-6! 


5000/60/80/90 


‘SRE a RE RSS BEN RD BES PIES SILL ES OE I TR ao 


panic: ill attempt to unlock rung 
panic: ill attempt to unlock tas 


MEANING 
Both of these messages mean that the master CPU 
(MCPU) has illegally attempted to exit a critical region 
it is not in: either the run queue critical region or the 
test and set critical region, depending on the 
message. 


PROBABLE CAUSE 


Hardware problem. 


SUGGESTED ACTION 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. 
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panic: ill attempt to unlock tas 


SEE: panic: ill attempt to unlock rung 
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panic: IO err in swap 


MEANING 


The concept of swapping is critical to both swapping 
and demand paging kernels. Exceptions (errors) 
detected during these procedures usually result in 
some type of system panic. 


This informs you that the swap device driver has 
detected an I/O error during an attempt to read or 
write swap space. 

This indicates that the kernel can no longer manage 
the memory successfully, a condition which normally 
should never occur. 


PROBABLE CAUSE 


Media hardware error, such as a bad block on a disk. 


SUGGESTED ACTION 


Change the location of the swap device to a different 
section on the current pack or replace the disk pack 
with another. 


If this alleviates the problem, the error was caused by 
a bad spot on the disk pack. 


If the problem persists, suspect disk drive and/or 


controller problems. Contact the Support Center. 
Meanwhile, try booting from a different disk drive. 


2-70 UP-12218 


3 Error Messages 
eI TE Ree PEI Ns FeSO SOREL SEES ETT ET eT eT 


panic: lost mem 


MEANING 

The system tried to allocate free memory but could not 
find any more. 

PROBABLE CAUSE 


Hardware or software problems. 


SUGGESTED ACTION 
Shut down and reboot the system. 


Check any new device drivers that have not been 
debugged completely. Also check any UNIX system 
device drivers that have been modified without 
authorization. Lastly, check the system description 
file to make sure its information is correct. 


If none of these clears the problem, suspect bad 
~ hardware. Contact the Support Center. 
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panic: lost text 


MEANING 


A pointer to the text of a process was lost. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. 
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panic: no fs 


MEANING 


The in-core super-block of a mounted file system 
cannot be found. The processor has halted. This 
should never happen. 


PROBABLE CAUSE 


Hardware or software problems. 


SUGGESTED ACTION 
Shut down and reboot the system. 


Check any new device drivers that have not been 
debugged completely. Lastly, check the system 
description file (.cf--described in config(1M) in the 
Administrator Reference Manual) to make sure its 
information is correct. i 


If none of these clears the problem, suspect bad 
hardware. Contact the Support Center. 
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panic: no imt 


MEANING 

A mount point was not found in the system mount table 
when traversing a file system boundary. The 
processor has halted. This should never happen. 


PROBABLE CAUSE. 


Hardware or software problems. 


SUGGESTED ACTION 
Shut down and reboot the system. Check any new 
device drivers that have not been debugged 


completely. 


If neither of these clears the problem, suspect bad 
hardware. Contact the Support Center. 
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panic: no mem for slave. 


MEANING 


When the kernel initializes itself, it counts the number 
of slave processors. Each slave processor takes up a 
certain amount of main memory for its own code and 
data space. The kernel verifies during initialization 
that each slave’s required memory is present. If there 
is not enough main memory available to complete the 
boot, this message is returned to the console. 


PROBABLE CAUSE 


Insufficient main memory. 


SUGGESTED ACTION 


Shut down the system and reboot. When the system is 
back up, examine the system description file (.cf-- 
described in config in the Administrator Reference 
Manual) to find the amount of memory allocated for the 
slave CPUs. Reconfigure the master kernel to reduce 
main memory requirements. 
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panic: no procs 


MEANING 


A process table entry cannot be found during a fork 
when it is known that an entry is available. During 
forking of a new process, the initial code determines 
that a process structure is available for the new 
process, but, during actual creation, no such 
structure can be located. 


PROBABLE CAUSE 


None. This is one of several processes which should 
never occur and are trapped at miscellaneous locations 
within the kernel. 


_ SUGGESTED ACTION 


Shut down and reboot the system. If the problem 
persists, contact the Support Center. 
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panic: nou area 


MEANING 


This indicates that the data structures that the kernel 
uses to describe a process have either been corrupted 
or used in an improper manner. 


PROBABLE CAUSE 


Hardware or software failure. 
SUGGESTED ACTION 


Shut down the system and reboot. If the message is 
displayed again, contact the Support Center. 
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panic: Timeout table overflow 


MEANING 

The system timeout table, which is used to implement 
software interrupts, has ‘overflowed while attempting 
to add another entry. The processor has halted. 
PROBABLE CAUSE 

Insufficient number of entries allocated for the call- 
out table in the system description file. 

SUGGESTED ACTION 

Reboot the system. If the condition persists, increase 
the number of entries allocated for the call-out table 


(calls) in the system description file--.cf described in 
config(1M) in the Administrator Reference Manual. 
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pe = %x, usp = %x, sr = %x 


SEE: addr regs: x %x %x x %x %x %x %x 
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Power failure. Starting recovery sequence 


MEANING 


The kernel has detected that the system power supply 
is being interrupted, while an uninterruptible power 
source (UPS) is present. It is starting to enter a 
timeout period, and will shortly update the disks, 
shut down the UPS and bring the system to a halt. 
This is an indication that you are about to go down. 


PROBABLE CAUSE 


Probably due to an unpredictable power interruption 
such as a weather related power surge in your area or 
someone inadvertently disconnecting the power cable 
from the machine. 


SUGGESTED ACTION 


Check your power connections and source and try 
rebooting the system. If your power source is all 
right, and you continue having power problems, turn 
the machine off and contact the Support Center. 
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process id = %xh 


SEE: addr regs: %x %x %x %x $x %x %x %x 
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process name that was running = %s 


SEE: addr regs: $x %x %x %x %x %x %x %x 
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process raw error 


MEANING 

This is a catch-all panic message that indicates 
procraw() has a problem getting either its tables or its 
memory management routines in order. This will be 
seen whenever raw I/O has a serious problem. It 
usually indicates a hardware problem. 


PROBABLE CAUSE 


Hardware or software failure. 


SUGGESTED ACTION 


Shut down and reboot the system. If this problem 
persists, contact the Support Center. 
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random interrupt 


MEANING 

An interrupt has been received for which the kernel 
cannot assign a source or cause. 

PROBABLE CAUSE | 


A 68000/68020 interrupt service problem. 


SUGGESTED ACTION 


If there are no visible consequences, continue normal 
operation. If this occurs often, shut down the system 
and reboot. If the message reappears, contact the 
Support Center. 
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realmem = %$d 


avail mem = %d 


MEANING 
Real memory is the number of bytes of physical 
memory on the CPU. Available memory is the number 


of bytes of physical memory actually available to user 
processes. 


PROBABLE CAUSE 


None. This message is for information only. 


SUGGESTED ACTION 


No action is required, but you may want to record 
-these figures in the system log for future reference. 
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Resetting MAC 


MEANING 


The multi-bus adapter card (MAC) driver that is 
provided is a model of how other vendors should write 
their MAC drivers. If it detects a failure with the 
MAC, it will go out, under certain circumstances, and 
actually reset the board and temporarily take it out of 
commission by resetting the multi-bus reset cycle for 
20 milliseconds--15 milliseconds longer than the 
multi-bus considers necessary. 


PROBABLE CAUSE 


Multi-bus adapter card failure. 


SUGGESTED ACTION 


None. Outside of the reset, this has no other effect on 
operation. 
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%s on bad dev %o(8) 


MEANING 


A problem has been detected with a file on a block 
device with a major number that exceeds the number of 
block device drivers generated by the system. 


The string (%S) may be: bad block, bad count, bad 
free count, no space, or out of inodes. %o is the minor 
device number. The eight in parentheses means it is 


an octal value. 


PROBABLE CAUSE 
Explanations of the values of %s: 
¢ bad block 
A block number is less than zero or greater than the 
maximum permissible for the system. The file 
system is corrupted. | 


e bad count 


There is a bad free count or a bad inode count. The 
file system is corrupted. 


e bad free count 


The free count field in the super-block is wrong. 
The file system is corrupted. 


e no space 


There is no more space in the logical file system. 
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e out of inodes 


There is an insufficient number of inodes available. 


SUGGESTED ACTION 


If the file system is corrupted, check it using 
fsck(1M) (see the entry for it in the Administrator 
Reference Manual). Try shutting down and rebooting 
the system. If you cannot alleviate this condition, 
contact the Support Center. 
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sem already locked 


MEANING 

The semaphore (sem) is a signal set by software to 
synchronize cycles in a multi-processor system. This 
message states that the master CPU (MCPU) is trying 
to reenter a critical region improperly. 


PROBABLE CAUSE 


Hardware problem. 
SUGGESTED ACTION 


Shut down the system and reboot. If this does not 
clear the problem, contact the Support Center. 
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Slave cpu in slot $x dead 


MEANING 


The master CPU (MCPU) was trying to get into a 
critical region, but a slave processor occupying the 
region refused to leave. The MCPU makes a reasonable 
number of attempts to enter the region, then times out 
and shuts down the slave that is holding the region. | 


This condition will return the Timeout on rung and 
lost process = %S messages, providing you with more 
information of what was happening when the shutdown 
began. The lost process number gives the invocation 
name of the process (an ASCII string) that the slave 
was running when it was locked out by the CPU. 


PROBABLE CAUSE 


Hardware failure. 


SUGGESTED ACTION 


Shut down the system and reboot. If this does not 
clear the problem, contact the Support Center. See 
Timeout on runq, Timeout on send cmd., and Timeout 
on tas for more information. 
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Slave cpu not responding. slotnum=%xh 


MEANING 


The kernel tried to load code into a slave CPU, and the 
slave refused to accept it. 


PROBABLE CAUSE 


| Bad slave file, bad hardware or a bus arbitration 
problem. 


SUGGESTED ACTION 


Check the slave file. Make sure the code being 
downloaded is correct. Try shutting down the system, 
reseating the board identified in the message, and 
rebooting. If none of these clears the problem, contact 
the Support Center. 
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%s pmb_priority = %xh 


SEE: memfault erraddr = %xh %s %s %s 
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Spurious interrupt from MAC slot %xh 


MEANING 

A multi-bus adapter card (MAC) board has generated 
an interrupt. When the kernel attempts to find out 
why, the MAC board shows no record of having sent 
the interrupt. The %x shows the slot number of the 
MAC that the kernel thinks is at fault. 

_ PROBABLE CAUSE 


Hardware failure. 
SUGGESTED ACTION © 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. 
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spurious interrupt 


MEANING 

This refers to a 68000/68020 spurious interrupt 
vector. A spurious interrupt is one that does not 
follow the interrupt protocol for the 68000/68020 
family processors. If the processor sees an interrupt 
coming in that does not satisfy that protocol, it 
vectors it as a spurious interrupt. 


PROBABLE CAUSE 


Hardware failure. 


SUGGESTED ACTION 


Contact the Support Center. 


2-94 UP-12218 


Error Messages 


SEERA EBS SIRE ASD IE DESERT ITE IE PL ATL LE DNC TEE I I St ESE SIS ET RN SSH 


Spurious rung timeout. 


MEANING 


There is a timeout on the critical region which is used 
to synchronize multiple processors. If the master CPU 
(MCPU) finds that one of the slave processors has 
been in the critical region for longer than a certain 
time period, it assumes the slave is executing 
improperly. The MCPU has a sequence of events it 
goes through. If during this sequence it detects that 
the slave processor has exited the critical region, the 
MCPU issues this message. This is not considered a 
hard failure, or a failure of any kind, but it does give 
important information that may be useful when 
diagnosing future system problems. 


You should never receive this message. 


PROBABLE CAUSE 


It is easier to say what this problem is not than to say 
what it is. For instance, it is not a hardware problem. 
And it is not necessarily a functional problem: it could 
be caused, for instance, by an arbitration problem in 
the bus. 


The MCPU goes through the lost process = routine, 
which provides you with information useful for system 
analysis. This yields (almost always) the invocation 
name of the process, an ASCII string, and brings 
down the system gracefully. 


SUGGESTED ACTION 


If you are receiving this message, try increasing the 
timeout (if it happens on more than one system) or 
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look into one system in which the slave processors are 
spending more time in the critical region than they 
should. Sometimes this may be an indication’ that 
arbitration on the processor memory bus is not 
working correctly. So it may be an indication of 
performance problems in hardware or software, but 
not an indication of functional problems. 
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Spurious tas timeout 


MEANING 


Similar to Spurious rung timeout, except that in this 
case the critical region is the one that is built to 
support the frap 2 instruction. 


PROBABLE CAUSE 


Hardware failure. 
SUGGESTED ACTION 


Shut down the system and reboot. If this does not 
clear the problem, contact the Support Center. 


UP-12218 2-97 


5000/60/80/90 


LR EAE EASES TE IP TREE LILLE OO LS DELLE TELE ILE E ITE PEE ELLE EOE I 


stray interrupt at x 


MEANING 


~The kernel received a hardware interrupt for which it 
could not determine a source. 


PROBABLE CAUSE 


This error can be caused by a device specified at an 
incorrect vector in the system description file (.cf-- 
described in config(1M) in the Administrator 
Reference Manual). It can also be caused by a 
hardware failure. | 


SUGGESTED ACTION 


Check the .cf file for an incorrect vector. Make the 
necessary corrections, then rebuild the kernel by 
following the procedure described in config(1M). If 
the system description file is correct, or the vector is 
0, suspect hardware problems. Make a note of the 
value of the string where the interrupt occurred and 
contact the Support Center. 
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stray interrupt on auto vec %xh 


MEANING 


The 68000/68020 family has eight priority levels (0 
through 7) associated with interrupt levels auto 
vectors 0 through 7. This messages states that an 
interrupt was received that could not be assigned to 
any of these sources. The interrupt was branded a 
_ stray and this message was returned to the console. 


%xh gives the auto vector the interrupt came in on. 


The message merely informs you that there are 
conditions in the system that are not normal. If you 
start seeing lots of stray interrupts, you could have a 
performance problem. 


PROBABLE CAUSE 


The kernel logs this and continues. Some of the 
conditions reported by these messages are minor-- 
there may be, for instance, some noise on the line, or 
a minor hardware fault. The messages are logged for 
diagnostic purposes in case a hard failure does occur 
shortly thereafter. They also inform you that there 
are things going on in the system which are not 
normal. 


SUGGESTED ACTION 


Contact the Support Center. 
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System Mode, 


SEE: addr regs: %x %x %x %x %x %xX %X %x 
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tape empty free list 


MEANING 


This is the tape equivalent of the disk empty free list 
message. After an interrupt routine, the master CPU 
(MCPU) has taken a request off the response queue 
and placed it on the free list. The CPU performs one 
final check of the free list to make sure everything is 
okay. During this check, it fails to see the item it just 
placed on the list and concludes something is wrong. 


PROBABLE CAUSE 


Usually a hardware problem. 


SUGGESTED ACTION 
Shut down the system and reboot. If the problem 


persists, contact the Support Center: the board may 
have to be replaced. 
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tape error: rc(%x) 


MEANING 


The kernel failed to understand a return code sent to 
it; , 


PROBABLE CAUSE 


Here are possible values of the string %s for 
CARTRIDGE DRIVE HSDTs. 


2- 


system sector bad 

illegal disk/tape command 
invalid sector number 
disk not formatted 

invalid system sector information 
drive not ready 

physical drive not mounted 
no such logical drive 
phy/log drive out of range 
invalid data in request 

no partial block r/w 


cartridge not in place 
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e drive not ready 
e write protected 


end of media 


e data error 


file mark detected 


¢ no data detected 

e retry count exceeded 

¢ beginning of media 

e tape drive hung during r/w 

e tape drive was reset 

° illegal command 
Here are possible values of the string %s for NINE 
TRACK HSDTs. 

¢ system sector bad 

¢ illegal disk/tape command 

¢ invalid sector number 

¢ disk not formatted 

e invalid system sector information 

e drive not ready 


¢ physical drive not mounted 
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2- 


no such logical drive 
phy/log drive out of range 
invalid data in request 
no partial block r/w 
cartridge not in place 
drive not ready 

write protected 

end of media 

data error 

file mark detected 

no data detected 

retry count exceeded 
beginning of media 

tape drive has hung up 
tape drive has been reset 
tape not online 

data overrun 

corrected error 


no nine track tape drive online 
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SUGGESTED ACTION 


The action you will have to take depends on the value 
of the identifying string. 
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tas 


MEANING 


The test and set (tas) bit identifies resource 
availability. The MC68000 processor uses an 
uninterruptible read/modify/write cycle which allows 
information to be accessed, modified and rewritten 
while insuring that the resource is not used by 
another device at the same time. In a multi-processor 
system this cycle must be emulated, and the tas bit 
helps accomplish this. 


If the bit gets to the kernel side, it means that it has 
been locked (set at 1) for such a long time that 
something must be wrong. 


The high speed disk/tape (HSDT) board can lock the 
tas bit under two circumstances: 


e when it is pulling requests off the request queue 
from the master CPU (MCPU) 


¢ when it is putting a request on the queue. 
PROBABLE CAUSE 
Hardware or software failure. 


SUGGESTED ACTION 


Shut down the system and reboot. If the problem 
persists, contact the Support Center. 


2-106 | UP-12218 


Error Messages 
SRLS I I I TN LTE TES EA ILI ETI IE TERED NTO SB EG OI IE CO EPS Et NRE 


tas_sem already locked 


MEANING 

There is a critical region available to user processes 
via execution of the trap 2 instruction which simulates 
the tas instruction. This message reports that a 
process is trying to enter its own critical region when 
it is already in that region. 


PROBABLE CAUSE 


Hardware problem. 
SUGGESTED ACTION 


Shut down the system and reboot. If this does not 
clear the problem, contact the Support Center. 
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Timeout on rung 


MEANING 


This reports a situation wherein, after the CPU has 
timed out on the critical region, the error handling 
code continues to see the slave processor in the 
critical region. It concludes that the slave processor 
is not leaving the critical region. 


PROBABLE CAUSE 


Probable hardware failure of the slave CPU board. 


SUGGESTED ACTION 


The operating system automatically determines which 
slave processor has the run queue locked, then shuts 
down that slave processor hardware, resets it and 
holds it off the system. Two other messages will come 
out at about the same time: 


Slave cpu in slot $x dead--this identifies which 
slave cpu caused the timeout on the run queue. 
The value %x is the board slot number (a 
hexadecimal number between O and 5), and is 
the number you will have to report to the 
Support Center Engineer if you are arranging 
to have the board replaced. 


Lost process = --the name of the process that 
was running when the timeout occurred. This 
will almost always be the invocation name, the 
filename from which the process was originally 
exec-ed. It will be an ASCII string. 


If a process was executing on the slave processor that 
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is detected bad due to the timeout on run queue, the 
kernel attempts to inform. the console what process was 
running on that processor. It does not try to recover 
that process: the process is marked bad and all the 
resources allocated for it are no longer accessible to 
the kernel, so any memory that has been allocated to 
that process is lost. This is because the resources are 
described by a data structure that the slave CPU is 
operating out of (the U page), and it cannot be 
trusted anymore. 


This routine helps degrade the system gracefully 
instead of just shutting down the whole operation. You 
can continue running. Frequently you will get 
multiple timeout on runq messages because the 
problem may not be due to the processor that is 
detected in the critical region. It may instead be due 
to other failures elsewhere in the system--a memory 
board, the direct memory access (DMA) controller or 
the MCPU--and the system cannot recover from an 
MCPU problem. 


Whenever you get this kind of message (any one of the 
three above), consider bringing the system down, 
performing maintenance, replacing the CPU, isolating 
the process, and so forth. 
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The master CPU (MCPU) communicates with the slaves 
under certain conditions. Generally, master and 
slave processors execute independently of one 
another, but under certain conditions the MCPU sends 
an interrupt to a slave to get the slave’s attention and 
direct it to perform some action. When it interrupts a 
slave to get its attention, the MCPU goes into a timeout 
loop while it waits for the slave’s response. If a slave 
does not answer within a reasonable amount of time, 
the MCPU times out, assumes the slave is 
malfunctioning, and prints out this message. 


The MCPU goes through the Slave CPU in slot tx dead 

and lost process = %s routines, which provide 
information useful for system analysis. These will 

return the number of the slot containing the board nd 
that initiated the condition, and usually the invocation 

name of the process (an ASCII string). 


The Timeout on rung, Timeout on send cmd., and 
Timeout on tas give more information on how the MCPU 
detected the problem, including additional detail on 
the type of problem that has been encountered. The 
error handling is much the same. 


PROBABLE CAUSE 


Hardware failure. 


SUGGESTED ACTION wi 


Shut down the system and reboot. If this does not 
clear the problem, contact the Support Center. 
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Timeout on tas 


MEANING 


Similar to the Timeout on runq, except that in this 
case the critical region is the one that is built to 
support the test and set instruction. 


PROBABLE CAUSE 


Har dware failure. 
SUGGESTED ACTION 


Shut down the system and reboot. If this does not 
clear the problem, contact the Support Center. 
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trap type %s, code = %x, fault register 


22452 


%x 
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Unrecognized board type in slot $xh 


MEANING 

The kernel recognizes boards supported by the 
operating system. This message indicates that the 
kernel has come across a board it does not recognize. — 
PROBABLE CAUSE 

The board could be: 

e seated incorrectly 


e an unsupported type 


e faulty. 


SUGGESTED ACTION 
Shut down the system, reseat the board identified in 


the message and reboot. If this does not clear the 
condition, contact the Support Center. 
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SEE: addr regs: %x %x %x Gx %xX Gx %x %x 
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User Trap Type %s, pe = %x, usp = %x, fault = %x 


> SEE: addr regs: %x %x Gx %x %x %x %x %x 
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MEANING 


This identifies the U page number (the physical page 
number) of the data structure that describes the user 
process involved in a bus error. It comes out above a 
message describing the user bus error in more detail. 


PROBABLE CAUSE 
Bus error during an access of an address in the 


multi-bus portion of the MAC address space, not in 
the shared RAM portion of that space. 


SUGGESTED ACTION 
The driver should properly handle the bus errcr and 


continue running. See mac %xh bus _~ error 
addr = %xh pe = %xh for more information. 
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WARNING: Swap space running out needed %d blocks 


MEANING 


While the kernel is attempting a cleanup of swap 
space, it determines that the swap space is running 
dangerously low. 


PROBABLE CAUSE 


The system has not been tuned to match its workload, 

so the workload is taxing the system’s swap space 
capacity. It could also be caused by having too many 
sticky-bit or very large processes’ running 
simultaneously. 


SUGGESTED ACTION 


Try first decreasing the number of sticky-bit 
processes and rebooting the system. If this does not 
clear the condition, increase the swap space. Before 
increasing the swap space, backup all files that have 
been created or changed since the system was last 
installed. Once you increase the swap space by 
reconfiguring the disk, you will have to reconfigure 
the kernel to allow it to access the increased space. 
To increase the space, follow the procedure described 
in the Installation and Verification Guide. 


UP-12218 Pee 


5000/60/80/90 


a 


DISK/TAPE ERRORS 


Following is a list of the error codes returned by the 
disk/tape driver. The errors are returned as two 
bytes with each byte potentially describing a different 
error. Usually, the disk driver returns only one 
error in the upper byte and also prints out a string 
defining the error. The format of the disk error 
message is: dk(#) blk(#) rc(#) intword(#) where: 


e dk is the minor number of the device requested (in 
decimal) 


e blk is the logical block number on that device (in 
decimal) 


¢ rc is the two byte return code (in hex) 


¢ intword in the interrupt word of the icb interface 
(this should always be zero) 


The tape driver just prints the string description */. 


° 1 /* system sector bad */ 


After the drive has been formatted, the HSDT 
controller will set up the system data in sectors 0 
through 17 of the reserved area. This error code 
will be returned if the controller cannot write to 
these sectors due to a cyclical redundancy check. 
(CRC) error, a data overrun, or a header search 
error. Since the system cannot write information 
there which is necessary for further disk I/O and 
since the drives are not supposed to have any 
errors on the first three tracks, either the drive 
must be reformatted or, if this does not solve the 
problem, the drive must be replaced. 


2-118 UP-12218 


Error Messages 
I I TNE ES hea RETR NT te ee ee 


°2 /* illegal disk/tape command */ 


The command sent to either the disk or the tape is 
not acceptable. Assuming that all of the I/O is 
handled by the standard UNIX drivers in the 
kernel, this should never happen. If it does, it 
indicates either a memory or dynamic memory 
controller (DMC) problem. 


e 3 /* invalid sector number */ 


The sector number requested does not lie within the 
bounds of the selected logical disk or physical disk. 
UNIX utilities should not cause this to happen, but 
user programs can request an invalid sector 
number for a particular logical drive. Check sector 
zero for the starting addresses and sizes of all of 
the logical disks. Compare this information with 
the actual sector that was requested. 


e 4 /* disk not formatted */ 


If the disk has been formatted, this usually 
indicates that some of the data in sector zero has 
been corrupted. Format the disk. 


«5 /* invalid system sector information */ 


The information in sectors 0 through 17 of the disk 
has been corrupted or the disk was never 
formatted. The spare information in sector 2, the 
skip track information in sector 4 or the list of bad 
sectors found by disktest could be bad. This could 
show up during the dsetup program when 
attempting to spare/unspare a bad sector if the 
linked list of sector numbers in sector 2 has been 
broken or dsetup’s information does not match that 
in sector 2. | 
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© 6 #/* drive not ready */ 


The drive is not presently ready or did not come 
ready soon enough after the system was powered up 
for its information to be stored in the controller’s 
data structures. Check the green ready light on 
the back of the drive. If it is on, check the cabling 
and reset the machine to see if this will solve the 
problem. 


e 8 /*no such logical drive */ 
The logical drive selected is not one of those 
specified in sector zero of the disk set up by 
dsetup. Run the dsetup program, either UNIX 
version or standalone, to see what the actual 
physical disk configuration is and check the logical 
disks listed against the request in error. 

© 9 /* phy/log drive out of range */ 
The physical drive requested is greater than the 
maximum number that is allowed for that controller. 
Only four physical disks are allowed per controller, 
only two for a nine track tape drive controller. 


e Oxa /* invalid data in req */ 


The data in the request to the controller has some 
incorrect information. This could include: 


1. the device is neither a disk nor a tape 


2. buffer address does not start on a long word 
boundary 


3. the requested count is greater than the 
maximum for that device 


4. the count is zero 
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o. the controller is attempting to spare/unspare a 
sector on cylinder zero 


° Oxb /* no partial block r/w */ 


The main memory buffer is not a long word multiple 
in size. The byte count is not a multiple of 400 hex. 


¢ Oxc /* skip track table is full */ 
Only 1000 skip tracks are allowed per controller. 
The full table condition may occur when attempting 


to load in the information for a new physical drive. 


Oxd /* skip track error */ 


The information in the skip track list is invalid. 
This is usually caused by the dsetup program 
writing improper data to sector 4/5. 


Oxe /* hit double alternate sector */ 


When a logical disk is designated as a FSYS type, 
dsetup places a -1 in the sector ID field of any bad 
sector and then never uses that sector in the 
freelist. The standard UNIX block I/O should 
never encounter this sector, but for any utility 
that just reads consecutive sectors on the disk, or 
when using a non-UNIX operating system, these 
sectors will show up as errors. This type of 
sparing will cause an error in utilities such as dd 
and volcopy. 


¢ Oxf /* too many sectors */ 


Certain disk requests will allow only a one sector 
read/write and others will not allow any request 
that will overlap a track boundary. This error will 
be returned if the requests do not meet the criteria 
set forth for that command. 
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Most of the errors below are the result of some 
hardware problem. Check the cables, disk, controller 
board, etc. 


2- 


OxlO /* hit alternate sector */ 


This error will be returned by disktest because, 
normally, when it is run, it does not expect any 
alternate sectors. Once the disk is set up and the 
alternate sectors are allocated, running disKtest 
will cause errors. 


Oxll /* data CRC error */ 
Check and spare (if necessary) the sector. 
Ox12 /* SEEK error */ 


The controller does five retries before reporting 
this error, so when this occurs either the disk is 
having some signal problems or the controler is not 
handshaking properly. 


Ox13 /* ready change */ 


Disk was logically disconnected. The disk was not 
ready when the controller tried to seek to a cylinder 
or read/write a sector. Check the cabling. 


Oxl4 /* cannot rezero */ 

If an error occurs during an attempt to seek to a 
cylinder, the controller will try to rezero the disk 
and send the seek command out again. This error 
indicates that the disk would not even rezero. 

Ox18 /* overrun */ 

This will occur only if there is a direct memory 


access (DMA) problem. -It means that either the 
disk or the tape transfer was started and, for some 
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reason, the DMA had problems causing an overrun. 
The software on the controller is designed to handle 
this so that the error should never get back to the 
main CPU, but it may be returned if your system 
has multiple HSDTs and an older revision of the 
main CPU PROMs that incorrectly set the DMA 
channels. Check the PROM revision level and 
perform a memory test. 


¢ Oxla /* not writable */ 


The disk’s write protection is enabled. Turn the 
Write Protect switch off. i 


¢ Oxle /* header search error */ 


The controller has sent out a request to read a 
specific sector and the header ID on that sector 
does not match the one that was supposedly 
selected. It may be on the wrong cylinder or wrong 
track but the seek command never reported an 
error. The track information on the disk could also 
be invalid. For both of these conditions, check the 
disk and controller board for problems. 


¢ Oxld /* timeout on rw */ 
The disk has a one-second time out for any 
read/write operation. This could occur because the 
disk was disconnected during the read/write 
operation; that is, the cable or the disk had a signal 
failure and could not return after completion of the 
read/write. The tape could also go offline during a 
read/write operation resulting in a timeout. 

e Oxle /* wrong gap length */ 
Format the disk 


e Oxlf /* abnormal termination */ 
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This disk interface is checked out prior to 
performing a seek. This error is returned if the 
system cannot communicate with the disk channel at | 
this time. Hardware failure. If rebooting does not 
fix this, try replacing the controller and/or the 
interface card. 
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Ox20 /* cartridge not in place */ 

Ox21 /* drive not ready */ 

Check the cabling and power connections. 
Ox22 /* write protected */ 


Check the Write Protect lock on the cartridge or the 
Write Enable ring on a nine track tape. 


Ox23 /* end of media */ 
Ox24 #£/* data error */ 


This will occur if the tape contains data that is 
unreadable or written in the wrong format, or if an 
attempt was made to read past the final end-of-file 
mark. Check the format (qll1 or q24) on an archive 
drive or the density on a nine track. An 
untensioned tape can sometimes result in this type 
of error. 


Ox25 #£/* file mark detected */ 
Ox26 /* no data detected */ 


There is no readable data on the tape: the tape was 
never written on. 


Ox27 /* retry count exceeded */ 
After eight unsuccessful tries to read or write the 


tape, the archive tape drive returns this error. 
This usually means that the tape is bad. 
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e Ox28 /* beginning of media */ 


Normal return code from a status command at the 
beginning of the tape. Not returned to the user. 


e Ox29 /* tape drive has hung up */ (Nine Track) | 
/* tape drive hung during r/w */ (Archive) 


This will be returned if the drive times out. The 
drive will take two minutes to time out on a r/w. 
Check the tape cartridge, tape motor, cabling and 
possibly the controller interface. 


e Ox2a /* tape drive has been reset */ (Nine Track) 
Ox2a /* tape drive was reset */ (Archive) 


If the tape drive hangs up, it is reset 
e Ox2b /* tape not online */ (Nine Track) 


The drive is logically disconnected. Check the 
cabling and power connections. 


e Ox2c /* data overrun */ (Nine Track) 


The block size on the tape is bigger than the 
number of bytes requested or the DMA channel is 
hung up. Run memory test, because this should 
this message should not be returned to you. 


e Ox2d /* corrected error */ (Nine Track) 


The nine track tape drive will attempt to correct 
certain errors during a read operation which, if it 
succeeds, will be designated with this message. 
This means that the data has been read correctly. 
This will not normally be reported to the console. 
On a write operation, this error means that the data 
was not written correctly, probably due to a bad 
tape. 
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Ox2e /* illegal command */ 


The command sent to the drive is not available for 
the archive tape drive. You should never see this 
message if you are using the standard UNIX 
utilities. If you do see it, the request data was 
corrupted. Check the DMA and memory. 


Ox2f /* invalid io interface */ 


An archive tape controller is connected to a nine 
track interface board or vice versa. 


Ox30 /* power fail interrupt */ 


If there is a power failure and an uninterruptible 
power source (UPS) backup is connected to the 
system, all tape requests are aborted. 


Ox31 /* no nine track tape drive online */ (Nine 
Track) 


A nine track request was sent for a device that isn’t 
nine track. This will only be seen if the first tape 
request after a boot is of the improper type. Check 
the interface cards and to see that your logical 
devices match the physical devices. 


Ox32 /* no tape burst detected */ (Nine Track) 


On a standard nine track tape there is a burst ID at 
the beginning of the tape. This error indicates that 
the burst ID was either not written on the first 
write or not read on the first read. Some drives do 
not write burst IDs in some modes: character major 
device number 21 does not check for burst IDs; 20, 
the default nine track character device, does. 
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